Sharing The Lookup Cache

Sometimes I’m the last person to know. I like to think that I know at least a little about SQL Server Integration Services, but it still seems that I’m always discovering some technique or tidbit of knowledge that had previously escaped my notice. I learned one such lesson this week during the TechEd Developers conference in Orlando. I was co-presenting with Donald Farmer[1] during his "Microsoft SQL Server 2008 Integration Services: From D’oh to Wow" session Friday morning. Here’s the deal:

in the SSIS data flow (both 2005 and 2008) the in-memory cache for a fully-cached Lookup Transformation can be shared between multiple Lookup Transformations, so long as they all use identical queries to populate their cache. This means that if two Lookups can be implemented with the same query (not a particularly common scenario, but it does happen from time to time) the performance of the package can be improved by only loading the lookup data into memory once.

There are opportunities here to refactor the lookup queries in existing data flows – consider a scenario where a data flow contains two Lookup Transformations, one to add a Department Name field to the data flow an done to and one to add a Department Contact field, both looking up based on the Department Key. A natural approach to solve this problem would be to have each Lookup use just the two columns it needs, thus making the lookup data as narrow as possible. (This is a best practice after all, right?) But by having both Lookups use the same three-column lookup query you can reduce the volume of data being read and cached, and potentially improve package performance. What’s not to like?

Funny you should ask.

After the session was over, I went to visit my friend and colleague John Welch, who was staffing one of the SQL Server Business Intelligence booths at the conference. I explained to him what I’d just learned (I was all excited in being able to share this with someone who loves SSIS as much as I do) and asked him if he had every heard of such a feature. I’ll paraphrase here, but his response was basically "uh, yeah." And John filled me in on the dark side of this cool little feature.

Take a look here: http://support.microsoft.com/kb/924016

That’s right, there is a known bug in SSIS related to this behavior. Basically if there are multiple data flows in the same package, or in multiple packages that run in the same process, it is possible for the cache to be cleared before all of the Lookup transformations are done using it. There are some good workarounds (and some not-so-good ones too – I would probably not want to disable caching for my Lookups to work around this bug) included in the KB article above that can make this feature usable, but you need to be careful using it. And of course if you’re using SQL Server 2008 SSIS, you can explicitly cache your data and share it in many more interesting and complex scenarios thanks to the new Cache Connection Manager options added to the Lookup transformation.

And of course this wasn’t the only thing I learned yesterday, but it’s the only one for which I have blogging time today. I’ve got the others filed away so once TechEd is done (maybe before, but as I have two breakout sessions for which to prepare this week I would not hope for too much) I’ll have more SSIS goodies to blog.

 

[1] Donald was the Program Manager for SSIS during the SQL Server 2005 release, and is now the Principal Program Manager for SQL Server Data Mining. It was a great honor (and a ton of fun!) to share the spotlight with such an engaging, knowledgeable and funny speaker. And to make things even cooler, we’re going to team up again for a repeat performance on Wednesday during the TechEd IT Professionals conference.

Advertisements

About ssimagine

My name is Matthew Roche, and I am a Senior Program Manager with the SQL Server product group at Microsoft. I work on Master Data Services and Data Quality Services, and have previously worked on SQL Server Integration Services. Although I work for Microsoft and will be posting on technical topics, I want to stress that this is a personal blog, and any opinions posted here are mine and mine alone. I built my career around SQL Server and Microsoft technologies for well over a decade before I joined Microsoft as an employee, and I plan on using this blog to share my personal experience and opinions. They may well be shaped by my experience on the SQL Server team, but they’re still mine, and not that of Microsoft, disclaimer, disclaimer, etc., etc..
This entry was posted in BI, Conference, SQL Server, SSIS. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s