Although SQL Server Integration Services, hereinafter SSIS, is capable of uploading Excel files, in most cases it can be time consuming because any small modifications to the Excel files can make the SSIS crash. For that reason, the best option is usually to transform those Excel files into .csv format, since uploading text files will cause you significantly less issues than the Excel files.

You can quickly save any Excel file as csv manually by saving as .csv from within Excel. However, it becomes an issue when you have to do the same for a lot of Excel files, or in cases were you need the change to be done automatically.

In this post, we will explain how to do this format change automatically using PowerShell and how to loop through files in the same directory in order to upload several Excel files together using SSIS as the main tool for the whole process.

1. Preparing the environment

In our example, we will be working with 2 Excel files with 3 tabs each that will be called Excel1.xlsx and Excel2.xlsx. These files contain stock information regarding products and are divided by categories in each tab. In the next image, you can see the Excel files’ structure:

Contenido fichero Excel

File “Excel1.xlsx”, tab “cat1”

 

Contenido fichero Excel

File “Excel1.xlsx”, tab cat2

 

This information will be saved to the Stock table with the following structure:

CREATE TABLE [dbo].[Stock](
  [Item] [varchar](20) NULL,
  [Categoria] [varchar](20) NULL,
  [Stock] [int] NULL,
  [FileName] [varchar](50) NULL,
  [InsertDate] [datetime] NULL
)

Columns FileName and InsertDate will help us to find out which file does each row belong to and on what date and time these were inserted into the table.

 

 

2. Converting Excel files to .csv

Firstly, we will need to create 4 parameters in our SSIS package:

  • ExcelExtension: to state the extension of our Excel file (i.e., xls or xlsx)
  • RutaCSV: path where the csv files will be left
  • RutaExcel: path where the Excel files will be left
  • RutaPowerShell: path where the PowerShell script will be left

Parametros

We will also need 3 variables:

  • commandPowerShell: location where the command to run the PowerShell script will be saved to
  • FileName: location where the file name will be saved to once the file has been converted into csv, so that the file name can be stored in our database table
  • FullFilePath: location where the full csv file path will be saved to. This variable shall be used later when creating the loop to iterate through the different files in order to upload them.

Our next step will be to prepare the commandPowerShell variable once all the path and Excel file extension parameters have been set-up. In order to do this, we will need to add the following “Expression” to the variable:

"-command \""+@[$Package::RutaPowerShell] + " -rutaExcel " + @[$Package::RutaExcel] + " -rutaCSV " + @[$Package::RutaCSV] + " -excelExt xlsx\""

With the above values in my own parameters, the resulting variable would be:

Comando PowerShell

This is the PowerShell file that will be run:

<#
#### Script to convert Excel files into CSV ####
Parameters:
    $rutaExcel: path where the Excel files are
    $rutaCSV: path where the csv files will be left
    $excelExt: Excel file extension (xls, xlsx)
#>
param ([string] $rutaExcel, [string] $rutaCSV, [string] $excelExt )

<# 
#### Function to convert Excel files into CSV ####
Parameters:
    $excelFileName: Excel file name
    $csvLoc: destination path for the csv files
    $excelLoc: source path for the Excel files
    $excelExtension: Excel file extension (xls, xlsx)
#> 
Function ExportExcelToCSV ($excelFileName, $csvLoc, $excelLoc ,$excelExtension)
{
    #we shall save the Excel file path
    $excelFile = $excelLoc + $excelFileName + $excelExtension
    $E = New-Object -ComObject Excel.Application
    $E.Visible = $false
    $E.DisplayAlerts = $false
    $wb = $E.Workbooks.Open($excelFile)

    #we will iterate every sheet in the file in order to convert it to csv
    foreach ($ws in $wb.Worksheets)
    {
        $n = $excelFileName + "_" + $ws.Name
        $ws.SaveAs($csvLoc + $n + ".csv", 6)
    }
    $E.Quit()
}

#mask to loop through all Excel files with the provided extension
$mask = "*."+$excelExt
#we will add the dot before the extension type
$ext = "."+$excelExt

#we shall obtain all Excel files that are compliant with the mask in the specified path
$ens = Get-ChildItem $rutaExcel -filter $mascara

#we shall loop through all Excel files found and call the function to convert them to csv
foreach($e in $ens)
{
    ExportExcelToCSV -excelFileName $e.BaseName -csvLoc $rutaCSV -excelLoc $rutaExcel -excelExtension $ext
} 

 

 

Our next step is to call our PowerShell script. For that purpose, we will use the “Execute Process Task” component

Toolboox execute process task

We will open the component, go to “Process”, and select the location where the PowerShell script is saved in our computer under Executable:

Configuración Execute Process Task

Following, we will go to “Expressions” and enter an expression to set-up the parameters under “Arguments” with the previously set-up commandPowerShell variable.

Configuracion arguments del execute process taskq

Once the above has been completed, we could do a small test to check whether the transformation of Excel files to csv is actually working.

We will place the Excel files in the path:

Ruta ficheros Excel

Run the component:

Ejecución de la tarea PowerShell

And once it has finished running, we will check the path where the csv files should have been created to

Ruta ficheros csv

As you can see in the image below, 6 csv files were created with the naming convention [ExcelFileName]_[ExcelTabName].csv

 

 

3. Uploading csv files to the database

Now it’s time to upload the above csv files to our database. In order to do this, we will use the “Foreach Loop Container” to loop through the directories and upload the csv files.

We will set this up in the “Collection” section by selecting “Foreach File Enumerator” as the Enumerator, then move to ”Expressions” to set-up the parameters for the source path where our csv files are located using our “RutaCSV” package parameter. Down below in the “Files:” we shall enter the “*.csv” mask in order to upload all csv files and under “Retrieve file name” we will select “Fully qualified” in order to save the full file path, which will be needed later on to indicate our reading source where the csv files are located:

Following, in the “Variable Mappings” section, we will map our FullFilePath variable, which is where the full csv file path will be saved to:

Once this has been completed, our loop should be fully set-up.

So now it’s time to set-up the tasks within the above loop. One of these will be a “Script Task” in order to obtain the csv file name from the full file path, and there will also be a “Data Flow” task to upload the csv files to our database.

Starting with the “Script Task”, we will drag the component inside the “Foreach Loop Container”. Then, double click on the “Script” section and enter the FullFilePath variable under ReadOnlyVariables and the FileName variable under ReadWriteVariables:

Click on the “Edit Script” button and enter the following code into the script in the Main function:

public void Main()
{
    //save full path into an auxiliary variable
    string aux = Dts.Variables["User::FullFilePath"].Value.ToString();
    //divide path using \
    string[] split = aux.Split('\\');
    //keep last piece of the path that contains file name
    string fileName = split[split.Length - 1];
    //save the name of the file in our varible
    Dts.Variables["User::FileName"].Value = fileName;

    Dts.TaskResult = (int)ScriptResults.Success;
}

At this point, we move on to the “Data Flow” by also dragging it into the loop component after the Script Task. Once inside, we will begin with the source, which will be a “Flat File Source”:

We will open it and click on “New” in order to create a new Flat File Connection Manager.

In this manner, the first Flat File Connection Manager window would be set-up as follows:

Under “File name”, we will enter the path to some of the files for the initial set-up, although this path will be subsequently set-up with parameters to ensure that it is automatically picked up from what is dynamically obtained from the loop.

We will then move on to the “Advanced” section in order to set-up the file columns. In this case, we will have 3 columns, ItemCategoria and Stock.

Once this is completed, our set-up will be ready and the next step will be to provide the parameters for the file path. In order to do this, go to “Connection Managers” in the bottom area, select the newly created InventarioCSV connection and right click to enter “Properties”. All properties will then appear listed on the right hand side of the screen, including the the “Expressions” section where we will enter the “FullFilePath” variable under ConnectionString:

Our next step will be to add a “Conditional Split” in order to discard the rows corresponding to the file headers.

As shown in the following image, each Excel file tab had a top header row that we do not want to save to our database:

Cabecera fichero Excel

So, in order to discard these rows, we will add the following condition to our Conditional Split:

Item == "Item" || Item == ""

This will ensure that no rows with an empty Item column or that contain the word “Item” will be uploaded.

Configuracion Conditional Split

Our default exit will be called “Detalle”, and this is what we will connect to our next component, a “Derived Column”:

Data flow estado final

This Derived Column will be used to add the columns that will contain the file name and insert date using the following set-up:

Configuracion Derived Column

And finally, all that is left to do is to enter our destination, which will be an “OLE DB Destination” component

We will open it and click on “New” in order to create the connection manager with the connection to our database.

Once this connection has been set-up, we will select the table where we want to save the data to, which in our example is the Inventario table we created previously:

And as the last step, in the “Mappings” section, we will map the columns imported with our flow with the table columns:

So this would be the final image of what our Data Flow should look like:

Data flow

And this would be the true image of what our Control Flow should look like:

Control Flow

At the very beginning, we added an “Execute SQL Task” that contains a truncated version of the Inventario table.

And now everything should be ready to run our package and check the results:

Control Flow ejecutado

As you can see, the csv files have been created to our directory:

And subsequently, these files were uploaded to our table:

I hope you find this article helpful  

Carmina Bernabeu

Carmina Bernabeu is a Data Platform Architect at SolidQ. She has a Computer Engineer degree, a Master's degree in Development of Applications and Web Services, and a Master's degree in BI by SolidQ. With more than 7 years of experience, she has been working in BI projects for top companies related with clothing and textile manufacturing, insurance, vehicles, fruits and vegetables distribution and socila media areas.
Carmina Bernabeu

Latest posts by Carmina Bernabeu (see all)