More Dating Problems

Ever have one of those dates that just bugs you whenever you look back on it? I’ve been having one since last night. More specifically, I’ve been researching a problem with the way that the SSIS Flat File Source data flow component handles bad dates. The problem became apparent to me because of a thread on the MSDN SSIS forums. And it doesn’t handle them particularly well. [1] Here’s the deal:

The SSIS flat file component relies on the Ole Automation method VarDateFromStr. This method will attempt to parse date strings in different formats regardless of the locale of the server on which the SSIS package is running. Here’s an excerpt from the KB article that describes the underlying problem:

In VarDateFromStr, the code does not strictly check the string passed
against the date format of the default system locale, default user locale, or
the locale passed to the function. The function returns without error if the
passed string is valid in any of the following date formats:

If you need to verify that the date is valid based on the specific date
format for a locale, do not depend on the VarDateFromStr function. This implies
that any code relying on the MFC function COleDateTime::ParseDateTime to do
strict checking should be changed also because this MFC function uses
VarDateFromStr internally. Microsoft recommends that you check the string first
in your code before calling VarDateFromStr or COleDateTime::ParseDateTime to
make sure it is a correct date for the date format of the locale. [2]

How does this show up in SSIS? Basically, unless your flat file contains dates in either OLE DB canonical format (yyyy-mm-dd hh:mm:ss) or ISO 8601 format [3], you will need to manually check the validity of your input dates. The simplest way to do this is probably loading the date column as a string and then doing the type casting using a Derived Column transform in your data flow.

The workaround is certainly not difficult, but the problem also isn’t particularly obvious. The KB article that describes the problem doesn’t mention SSIS (or for that matter, any operating systems more recent than NT4) and my research into the problem didn’t yield any fruit through online searching. Hopefully now that we have this blog post and the MSDN forums thread, the details will be easier for everyone to locate…

[1] Personally, I’ve always handled bad dates by leaving early and stiffing the girl with the check. Some people say that this is inappropriate, but I think that this SSIS behavior is even worse, because it’s going to fail silently and who knows when you’ll discover it.
[2] Unfortunately, there is no “strict checking” option or the like for the SSIS Flat File source component, so your options here appear to be limited to working around the problem, rather than solving it.
[3] In which case, VarDateFromStr is not called.


About ssimagine

My name is Matthew Roche, and I am a Senior Program Manager with the SQL Server product group at Microsoft. I work on Master Data Services and Data Quality Services, and have previously worked on SQL Server Integration Services. Although I work for Microsoft and will be posting on technical topics, I want to stress that this is a personal blog, and any opinions posted here are mine and mine alone. I built my career around SQL Server and Microsoft technologies for well over a decade before I joined Microsoft as an employee, and I plan on using this blog to share my personal experience and opinions. They may well be shaped by my experience on the SQL Server team, but they’re still mine, and not that of Microsoft, disclaimer, disclaimer, etc., etc..
This entry was posted in BI, SSIS. Bookmark the permalink.

2 Responses to More Dating Problems

  1. Phil Brammer says:

    One note. The use of a 4-digit year should solve this problem as well. Which, aren’t we all using 4-digit years yet?? 😉

  2. Yeah, that was one vital aspect of the problem scenario that I neglected to mention here, wasn’t it? At least I remembered to link to the Forums thread where this was covered in some depth.My personal guess is that this will most likely be seen by people who get flat files from legacy systems, which may well not use four-digit years. We developers just need to remember to use all four digits when we’re building NEW legacy systems, I suppose…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s