SQL Server Integration Services (SSIS) is an Extract-Transform-Load (ETL) tool. It is probably not the best tool in the market, but if you compare it to the SQL Server 2000 Data Transformation Services (DTS), it is a big step forward. Of course, you still get it free with a valid SQL Server 2005 license.
How can you evaluate an ETL tool? Well, you can program the complete ETL process manually, with the programming language of your choice. You create procedures that do the transformations on the data. A good ETL tool should have many such procedures prepared in advance, with an easy user interface that allows you to set up their parameters and thus program the ETL process graphically. Of course, the ETL application you create should perform well, because typically you have a limited time window for your ETL process.
Microsoft wrote SSIS from the scratch. In my opinion, they actually made an excellent job. DTS was really and entry-level ETL tool. What really makes the difference is the Data Flow in SSIS 2005. DTS was always more task-oriented. You read the source data, do some transformation, and then write the data to some staging area, read it from there, do another transformation, write it again and so on and so on. This is not very efficient process. The Execute SQL task was probably the most used task in DTS. The only way you can do some transformations on the data while it is passing through the memory is the Data Pump task. With Data Pump task, you read the data with a source query transform it with an ActiveX script on the fly and write it to destination. Because the script can be very complex, you can do many transformations in the same step. The Data Pump task has many limitations. Except very simple transformations, you have to program everything manually. Scripting languages are not very fast. You are limited to a single data source and a single data destination.
The Data Flow in SSIS 2005 changes this picture completely. Think of it as of an incredibly enhanced Data pump task. After you read the data from one or multiple data sources, you do a series of out-of-the-box transformations on the data while it is passing by and then you can write the data to multiple destinations. The name “Data Flow” comes from the plumbing analogy – the data flows through the memory similarly as the water flows through a pipe. However, this is just an analogy; the Data Flow is even more efficient: instead of the data being passed from transformation to transformation, the transformations pass over the same buffers of data, so the data is even not copied from a buffer to a buffer. This architecture makes the ETL process very efficient.
SSIS 2005 has many additional new and / or useful features, including package configurations, logging and auditing, event and error handling, to name jus a few. Of course, Control Flow is still there, with many new tasks and better control of the flow. Nevertheless, I think that the Data Flow is the single most important differentiator from the DTS 2000. SSIS is a real ETL tool, while I guess many of us used DTS as an ELT tool.
At the end, I have to repeat a very well know sentence: there is no free lunch. Because SSIS is written from the scratch, upgrade of DTS packages is based on the “best effort” principle. This does not mean you have to rewrite all of the packages immediately; you can still run the old packages. If the upgrade is main your concern now and in the future, than doing an ELT process by using the Execute SQL task that calls T-SQL stored procedures is your best bet. With the T-SQL language, Microsoft really has to take care about backward compatibility.
- Python for SQL Server Specialists Part 4: Python and SQL Server - April 24, 2018
- Python for SQL Server Specialists Part 3: Graphs and Machine Learning - April 11, 2018
- Python for SQL Server Specialists Part 2: Working with Data - March 22, 2018