Using Excel in ETL Processes

Using Excel in ETL Processes

Microsoft Excel is a very popular tool among users, so it’s common to find Excel files when implementing extraction, transformation, and loading (ETL) processes. After I explain the reasons why users create Excel files and where you might find them, I’ll talk about some common problems that you might encounter when dealing with them in ETL processes, such as problems with drivers, hidden data, data types, certain transformations, separators, and worksheet variants. I’ll also present the recommended solutions for these problems. (more…)

Fuzzy Lookup in Excel

My good friend Gianluca informed me that Microsoft created a new add-in for Excel 2010: Fuzzy Lookup. Fuzzy Lookup was developed by Microsoft Research and was first introduced in SSIS 2005. Now it is available in Excel 2010 as well. It is used for identity mapping and de-duplicating. For approximate string searches, it uses one of the most advanced algorithms currently available. I am already testing it, and works like expected, like SSIS variant: fantastic!

Excel Slow as OLAP Client

Sometimes Excel 2000/XP/2003 is hundred times or slower than other popular OLAP clients (ProClarity, NovaView…). Some queries can take 15 minutes in Excel, while they execute in couple of seconds in other clients. The problem is in fact that Excel does not use NON EMPTY keywords or NONEMPTYCROSSJOIN() function in the MDX query; so it filters the data on client side. As Excel does not show or allow us to modify the MDX query, we can’t do much about it. We can use other client tools, or maybe create our own report using Reporting Services.


SQL Server & Excel

Using Excel file as a linked server usually causes some headache. Main reasons for the headache are:

– Excel does not use the concepts “Catalog” and “Schema”, so we must refer to a table with server…table

– If not defined differently, one sheet represents one table with the “$” sign appended, i.e. Sheet1$

– Excel does not have login security, so we must change the default security:

   EXEC sp_addlinkedsrvlogin ‘ExcelSource’, ‘false’

In Excel we can also define named ranges of cells. We refer to a named range without the dollar sign appended. Here are some examples how to use Excel as a linked server.