Know Your Data… Werewolves

(That’s right. Werewolves.)

Over a year ago, I posted a list of SSIS best practices, and I’ve presented on this topic a dozen or more times since then at conferences and user groups and seminars. One of these best practices was "Really know your data – really!" and the advice went something like this:

"If there is one lesson I’ve learned it is that source systems never behave the way you expect them to and behave as documented even less frequently. Question everything, and then test to validate the answers you retrieve. You need to understand not only the static nature of the data – what is stored where – but also the dynamic nature of the data – how it changes when it changes, and what processes initiate those changes, and when, and how, and why. Odds are you will never understand a complex source system well enough, so make sure you are very friendly (may I recommend including a line item for chocolate and/or alcohol in your project budget?) with the business domain experts for the systems from which you will be extracting data. Really."

Well, last night I stumbled across some old meeting notes that reminded me of why this is so very important. The notes were incredibly brief, and consisted largely of a quote from the client project owner. But before I share the quote, please let me share the context and story[1]…

I was helping to design and build a BI application that my client was then re-selling as a service to their customers.[2] We were loading in data from a variety of source systems that provided the same type of data, but which were implemented by different vendors. One of the systems, called FooBar[3] was consistently causing us problems. The data import was running without error, and everything looked good on the surface, but the FooBar-using customers were unhappy[4] because the data they were seeing through the BI portal was incorrect.

Yuck.

The client project owner had, years earlier, worked with the development group that built FooBar, and with his insight we were able to discover that occasionally the data export process in FooBar was failing silently, so that we were getting only partial exports, and there was no way to tell – no error was raised, no checksum existed, and because of some nasty internal details (which I won’t go into here) the effort involved in updating FooBar would be very significant. It was not a good time.

So what’s the quote? It was this:

"You guys planned for wolves. FooBar is werewolves."

Even after all this time, I still remember the deadpan delivery for this gem. And it was so true. We’d gone in thinking we were prepared for all of the usual problems – and we were. But we weren’t prepared for the horrifying reality of the data problems that were lying in wait. We weren’t prepared for werewolves.

So what’s the moral of the story? Plan for werewolves. Assume the worst. Test early and often, and test failure scenarios, not just happy-day scenarios. Because the time to learn that there really are werewolves is when there is still time to pack a crossbow and some silver bolts, not when the full moon is rising.

Trust me.

 

[1] Names have been changed to protect the innocent. And the other people too…

[2] This has been a recurring theme for me, so it’s probably not the company you’re thinking of.

[3] I changed the names, remember?

[4] I may also use a literary tool called "understatement" in this story…

Advertisements

About ssimagine

My name is Matthew Roche, and I am a Senior Program Manager with the SQL Server product group at Microsoft. I work on Master Data Services and Data Quality Services, and have previously worked on SQL Server Integration Services. Although I work for Microsoft and will be posting on technical topics, I want to stress that this is a personal blog, and any opinions posted here are mine and mine alone. I built my career around SQL Server and Microsoft technologies for well over a decade before I joined Microsoft as an employee, and I plan on using this blog to share my personal experience and opinions. They may well be shaped by my experience on the SQL Server team, but they’re still mine, and not that of Microsoft, disclaimer, disclaimer, etc., etc..
This entry was posted in Best Practices, BI, Quote, SSIS. Bookmark the permalink.

One Response to Know Your Data… Werewolves

  1. atlas245 says:

    This comment has been removed by a blog administrator.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s