Sorting and Metadata

A recent post to the SSIS Forums on MSDN made me realize how un-intuitive it can sometimes be to get sorted data in an SSIS data flow. Let’s review the obvious facts:

  • When you query data from a relational database like SQL Server, the OLE DB source component gets metadata from the database server based on the SELECT statement.
  • This metadata includes things like column names, data types and so on.
  • The SSIS data flow designer uses this metadata to build the data flow pipeline.
  • The SQL SELECT statement includes an ORDER BY clause which specifies the sort order for the data to be returned.
  • This sort order is not provided by the database server, so the SSIS data flow has no way to tell if the data is sorted or not.

Hey, wait! That last fact may not be so obvious unless you think about how the database server is presenting this metadata to the SSIS tools. Essentially the OLE DB source component is sending the SELECT statement to the server, with some modifications to limit the number of rows being returned. The server returns the rows along with metadata that describes them. If the database is SQL Server, this data and metadata will be returned in a Tabular Data Stream (TDS) protocol stream – this is the protocol that SQL Server uses, so this is what SSIS gets back.

But if you’ve ever looked at the TDS spec, you know that TDS doesn’t have any concept of sort order, so there’s no way that SSIS can get the sort order from SQL Server.

So how does SSIS know that the data is sorted, and by what columns?

It knows because you tell it. And it trusts you implicitly[1] so you need to be careful to be honest.

So how do you do it?

Consider this SELECT statement:

  SELECT [AddressID]
        ,[AddressLine1]
        ,[AddressLine2]
        ,[City]
        ,[StateProvinceID]
        ,[PostalCode]
        ,[rowguid]
        ,[ModifiedDate]
    FROM [Person].[Address]
ORDER BY [PostalCode]
        ,[ModifiedDate] DESC

So how would you tell the SSIS data flow how this data is sorted?

There are two things you need to do, both of them in the Advanced Editor for the OLE DB source component. First, set the IsSorted property of the OLE DB Source Output to True. This essentially tells SSIS that the data will be sorted, but not how it will be sorted.

01 - IsSorted

Next, set the SortKeyPosition property for the PostalCode and ModifiedDate columns in the Output Columns collection. Set SortKeyPosition to 1 for PostalCode because this is the first column in the ORDER BY clause. Set SortKeyPosition to -2 for ModifiedDate because this is the second column in the ORDER BY clause, and the column is marked as DESC – this is why it’s -2 and not 2.

02 - SortKey 1 

03 - SortKey 2

Now, when this package is executed, not only will the data be sorted (because we added an ORDER BY clause to our source query) but SSIS will know that the data is sorted as well, because we told it that it was. Now any downstream components (such as a Merge or Merge Join transformation) that require sorted inputs will be able to do what they need to do.

Life is good, because SSIS loves you.

 

[1] SSIS trusts you because SSIS loves you. More than you know.

Advertisements

About ssimagine

My name is Matthew Roche, and I am a Senior Program Manager with the SQL Server product group at Microsoft. I work on Master Data Services and Data Quality Services, and have previously worked on SQL Server Integration Services. Although I work for Microsoft and will be posting on technical topics, I want to stress that this is a personal blog, and any opinions posted here are mine and mine alone. I built my career around SQL Server and Microsoft technologies for well over a decade before I joined Microsoft as an employee, and I plan on using this blog to share my personal experience and opinions. They may well be shaped by my experience on the SQL Server team, but they’re still mine, and not that of Microsoft, disclaimer, disclaimer, etc., etc..
This entry was posted in BI, SQL Server, SSIS. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s