ETL – SSIS Step 2

The last interesting step here involves a number of calls to stored procedures that were not rewritten yet.

The original code was using a Execute SQL Task in the Control Flow to execute all of the stored procedures.

(more…)

ETL – SSIS Step 1

Lately i was working on project where the current ETL process was approaching 8 hours run time. There were a few critical tables to had over 100,000,000 rows to load that were bogging down the system. The entire process was done by a ton of SQL Stored Procedures that were simply executed by an SSIS package. (more…)
SSIS – Data Profiling

SSIS – Data Profiling

SSIS (SQL Server Integration Services) is a powerful technology for most things data related. This includes moving data in and out of SQL Server and many other things like copying files or in my case today profiling data.The following table shows a problem we run into all the time with many clients. In this case we have a lookup table that lists product categories. As you can see there is a lot of bad data there where the category names are not consistent. For instance Belt is also shown as belt, blt, and bt. And it’s not usually just the database issue necessarily its either the database is not designed right and/or the applications are not handling data correctly.

(more…)

Diving into the SSIS execution engine

Diving into the SSIS execution engine

The purpose of this article is to expand on the already familiar White Paper, “Integration Services: Performance Tuning Techniques”, which was written by Elizabeth Vitt and Hitachi Consulting, and published at TechNet in 2006. Some of the concepts in that White Paper will be reproduced here, both in the interests of clarity and to link the two documents – thus enriching the reader’s knowledge of SQL Server Integration Services’ inner mechanisms.

(more…)

Integration Services Key Feature in BI Projects (Part 1 of 2)

Integration Services Key Feature in BI Projects (Part 1 of 2)

Data Warehouse is a key feature in a BI project, which requires the existence of an ETL process to feed it from different sources. SQL Server contains a tool called Integration Services (SSIS) that greatly helps to carry out the ETL processes. In this article, we will tour an SSIS project, talking about the key features, reviewing the design and output, and showing the most used components. (more…)

Fuzzy Lookup in Excel

My good friend Gianluca informed me that Microsoft created a new add-in for Excel 2010: Fuzzy Lookup. Fuzzy Lookup was developed by Microsoft Research and was first introduced in SSIS 2005. Now it is available in Excel 2010 as well. It is used for identity mapping and de-duplicating. For approximate string searches, it uses one of the most advanced algorithms currently available. I am already testing it, and works like expected, like SSIS variant: fantastic!