Data Source Annotations

This post is something of a follow-up on the “Annotate like you mean it” best practice I mentioned in my best practices post last week. I always figured that the technique was something of a no-brainer, but when I mentioned it to Greg Low in our SQL Down Under conversation last week he seemed a little surprised (in a good sort of way) and if he hadn’t thought of it[1] odds are there are a few people out in SSIS-land who might not have thought of it either.

In my last post I talked about updating an SSIS package to use Common Table Expressions in the source query to replace a complex set of asynchronous transforms in the package’s data flow. The upside of this change was that the package performed better by several orders of magnitude. The downside is that the source query became more complex.

Now this isn’t inherently a bad thing, but it is undesirable in that it makes the package less self-documenting. In addition, because I tend to base my source queries on expressions[2], it is not always a simple task to figure out exactly what query is being executed. Because of this, and because self-description is an admirable goal for any development artifact, I will generally update my data flow with an annotation on the design surface that shows the query next to the data source component. In this scenario, the data flow ended up looking something like this:

As you can see, it is now much easier to understand the data being extracted than it would be to examine the data source component to find out on which variable its query was based, and then examine the variable to understand the expression on which it, in turn, was based.
The obvious drawback of this approach is that as soon as the annotation is created, it is instantly out of date. There is no way to create a “dynamic annotation” in SSIS[3], so there is no enforced relationship between the annotation and the data source component that it describes. This then puts something of a maintenance demand on the package developers, as they must remember to update the annotation each time the source query is updated.
I personally think that this is an acceptable tradeoff. The maintenance cost itself is low, and the benefits of having a well-documented package are considerable. Also, if you follow the best practice of “Get your metadata right first, not later” then you will likely not need to update the source query (or the annotation) very often.
[1] I have a great deal of respect for Greg’s knowledge and experience, so if there’s something that is new to him, it’s worth blogging about.
[2] Which will likely be a post of its own before too long.
[3] Although having annotations based on expressions would be a cool new feature, wouldn’t it?

About ssimagine

My name is Matthew Roche, and I am a Senior Program Manager with the SQL Server product group at Microsoft. I work on Master Data Services and Data Quality Services, and have previously worked on SQL Server Integration Services. Although I work for Microsoft and will be posting on technical topics, I want to stress that this is a personal blog, and any opinions posted here are mine and mine alone. I built my career around SQL Server and Microsoft technologies for well over a decade before I joined Microsoft as an employee, and I plan on using this blog to share my personal experience and opinions. They may well be shaped by my experience on the SQL Server team, but they’re still mine, and not that of Microsoft, disclaimer, disclaimer, etc., etc..
This entry was posted in Best Practices, BI, SQL Server, SSIS. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s