Azure Database integrated authentication with SSIS

Azure Database integrated authentication with SSIS

In many scenarios, we face the need to use integrated authentication in order to gain access to the required data sources to feed our analytical system. In view of Azure’s increasingly widespread use, as is the case with at least part of our infrastructure, some of these sources are hosted in Azure databases. In this case, we will discuss an actual error that we have come across when configuring and using integrated authentication in Azure databases with SSIS.
(more…)

Power BI Bookmarks! What are they for? How can I use them?

Power BI Bookmarks! What are they for? How can I use them?

In this entry, we will show you how to create bookmarks and a few different scenarios where they might be useful. Power BI Bookmarks are basically used to store the status of a specific report page including the filter selection and the visibility of the different objects, allowing the user to return to that same status by simply selecting the saved bookmark.

(more…)

Converting Excel files into CSV and uploading files using SSIS

Converting Excel files into CSV and uploading files using SSIS

Although SQL Server Integration Services, hereinafter SSIS, is capable of uploading Excel files, in most cases it can be time consuming because any small modifications to the Excel files can make the SSIS crash. For that reason, the best option is usually to transform those Excel files into .csv format, since uploading text files will cause you significantly less issues than the Excel files.

You can quickly save any Excel file as csv manually by saving as .csv from within Excel. However, it becomes an issue when you have to do the same for a lot of Excel files, or in cases were you need the change to be done automatically.

In this post, we will explain how to do this format change automatically using PowerShell and how to loop through files in the same directory in order to upload several Excel files together using SSIS as the main tool for the whole process.

1. Preparing the environment

In our example, we will be working with 2 Excel files with 3 tabs each that will be called Excel1.xlsx and Excel2.xlsx. These files contain stock information regarding products and are divided by categories in each tab. In the next image, you can see the Excel files’ structure:

Contenido fichero Excel

File “Excel1.xlsx”, tab “cat1”

 

Contenido fichero Excel

File “Excel1.xlsx”, tab cat2

 

This information will be saved to the Stock table with the following structure:

CREATE TABLE [dbo].[Stock](
  [Item] [varchar](20) NULL,
  [Categoria] [varchar](20) NULL,
  [Stock] [int] NULL,
  [FileName] [varchar](50) NULL,
  [InsertDate] [datetime] NULL
)

Columns FileName and InsertDate will help us to find out which file does each row belong to and on what date and time these were inserted into the table.

 

 

2. Converting Excel files to .csv

Firstly, we will need to create 4 parameters in our SSIS package:

  • ExcelExtension: to state the extension of our Excel file (i.e., xls or xlsx).
  • RutaCSV: path where the csv files will be left.
  • RutaExcel: path where the Excel files will be left.
  • RutaPowerShell: path where the PowerShell script will be left.

Parametros

We will also need 3 variables:

  • commandPowerShell: location where the command to run the PowerShell script will be saved to.
  • FileName: location where the file name will be saved to once the file has been converted into csv, so that the file name can be stored in our database table.
  • FullFilePath: location where the full csv file path will be saved to. This variable shall be used later when creating the loop to iterate through the different files in order to upload them.

Our next step will be to prepare the commandPowerShell variable once all the path and Excel file extension parameters have been set-up. In order to do this, we will need to add the following “Expression” to the variable:

"-command \""+@[$Package::RutaPowerShell] + " -rutaExcel " + @[$Package::RutaExcel] + " -rutaCSV " + @[$Package::RutaCSV] + " -excelExt xlsx\""

With the above values in my own parameters, the resulting variable would be:

Comando PowerShell

This is the PowerShell file that will be run:

<#
#### Script to convert Excel files into CSV ####
Parameters:
    $rutaExcel: path where the Excel files are
    $rutaCSV: path where the csv files will be left
    $excelExt: Excel file extension (xls, xlsx)
#>
param ([string] $rutaExcel, [string] $rutaCSV, [string] $excelExt )

<# 
#### Function to convert Excel files into CSV ####
Parameters:
    $excelFileName: Excel file name
    $csvLoc: destination path for the csv files
    $excelLoc: source path for the Excel files
    $excelExtension: Excel file extension (xls, xlsx)
#> 
Function ExportExcelToCSV ($excelFileName, $csvLoc, $excelLoc ,$excelExtension)
{
    #we shall save the Excel file path
    $excelFile = $excelLoc + $excelFileName + $excelExtension
    $E = New-Object -ComObject Excel.Application
    $E.Visible = $false
    $E.DisplayAlerts = $false
    $wb = $E.Workbooks.Open($excelFile)

    #we will iterate every sheet in the file in order to convert it to csv
    foreach ($ws in $wb.Worksheets)
    {
        $n = $excelFileName + "_" + $ws.Name
        $ws.SaveAs($csvLoc + $n + ".csv", 6)
    }
    $E.Quit()
}

#mask to loop through all Excel files with the provided extension
$mask = "*."+$excelExt
#we will add the dot before the extension type
$ext = "."+$excelExt

#we shall obtain all Excel files that are compliant with the mask in the specified path
$ens = Get-ChildItem $rutaExcel -filter $mascara

#we shall loop through all Excel files found and call the function to convert them to csv
foreach($e in $ens)
{
    ExportExcelToCSV -excelFileName $e.BaseName -csvLoc $rutaCSV -excelLoc $rutaExcel -excelExtension $ext
} 

 

 

Our next step is to call our PowerShell script. For that purpose, we will use the “Execute Process Task” component

Toolboox execute process task excel to csv

We will open the component, go to “Process”, and select the location where the PowerShell script is saved in our computer under Executable:

Configuración Execute Process Task Excel to csv

Following, we will go to “Expressions” and enter an expression to set-up the parameters under “Arguments” with the previously set-up commandPowerShell variable.

Configuracion arguments del execute process taskq excel to csv

Once the above has been completed, we could do a small test to check whether the transformation of Excel files to csv is actually working.

We will place the Excel files in the path:

Ruta ficheros Excel

Run the component:

Ejecución de la tarea PowerShell

And once it has finished running, we will check the path where the csv files should have been created to

Ruta ficheros csv

As you can see in the image below, 6 csv files were created with the naming convention [ExcelFileName]_[ExcelTabName].csv

 

 

3. Uploading csv files to the database

Now it’s time to upload the above csv files to our database. In order to do this, we will use the “Foreach Loop Container” to loop through the directories and upload the csv files.

We will set this up in the “Collection” section by selecting “Foreach File Enumerator” as the Enumerator, then move to ”Expressions” to set-up the parameters for the source path where our csv files are located using our “RutaCSV” package parameter. Down below in the “Files:” we shall enter the “*.csv” mask in order to upload all csv files and under “Retrieve file name” we will select “Fully qualified” in order to save the full file path, which will be needed later on to indicate our reading source where the csv files are located:

Foreach Loop Editor Excel to csv

Following, in the “Variable Mappings” section, we will map our FullFilePath variable, which is where the full csv file path will be saved to:

Once this has been completed, our loop should be fully set-up.

So now it’s time to set-up the tasks within the above loop. One of these will be a “Script Task” in order to obtain the csv file name from the full file path, and there will also be a “Data Flow” task to upload the csv files to our database.

Starting with the “Script Task”, we will drag the component inside the “Foreach Loop Container”. Then, double click on the “Script” section and enter the FullFilePath variable under ReadOnlyVariables and the FileName variable under ReadWriteVariables:

Click on the “Edit Script” button and enter the following code into the script in the Main function:

public void Main()
{
    //save full path into an auxiliary variable
    string aux = Dts.Variables["User::FullFilePath"].Value.ToString();
    //divide path using \
    string[] split = aux.Split('\\');
    //keep last piece of the path that contains file name
    string fileName = split[split.Length - 1];
    //save the name of the file in our varible
    Dts.Variables["User::FileName"].Value = fileName;

    Dts.TaskResult = (int)ScriptResults.Success;
}

At this point, we move on to the “Data Flow” by also dragging it into the loop component after the Script Task. Once inside, we will begin with the source, which will be a “Flat File Source”:

We will open it and click on “New” in order to create a new Flat File Connection Manager.

In this manner, the first Flat File Connection Manager window would be set-up as follows:

Flat File Connection manager excel to csv

Under “File name”, we will enter the path to some of the files for the initial set-up, although this path will be subsequently set-up with parameters to ensure that it is automatically picked up from what is dynamically obtained from the loop.

We will then move on to the “Advanced” section in order to set-up the file columns. In this case, we will have 3 columns, ItemCategoria and Stock.

Once this is completed, our set-up will be ready and the next step will be to provide the parameters for the file path. In order to do this, go to “Connection Managers” in the bottom area, select the newly created InventarioCSV connection and right click to enter “Properties”. All properties will then appear listed on the right hand side of the screen, including the the “Expressions” section where we will enter the “FullFilePath” variable under ConnectionString:

Our next step will be to add a “Conditional Split” in order to discard the rows corresponding to the file headers.

As shown in the following image, each Excel file tab had a top header row that we do not want to save to our database:

Cabecera fichero Excel

So, in order to discard these rows, we will add the following condition to our Conditional Split:

Item == "Item" || Item == ""

This will ensure that no rows with an empty Item column or that contain the word “Item” will be uploaded.

Configuracion Conditional Split excel to csv

Our default exit will be called “Detalle”, and this is what we will connect to our next component, a “Derived Column”:

Data flow excel to csv

This Derived Column will be used to add the columns that will contain the file name and insert date using the following set-up:

Configuracion Derived Column

And finally, all that is left to do is to enter our destination, which will be an “OLE DB Destination” component

We will open it and click on “New” in order to create the connection manager with the connection to our database.

Once this connection has been set-up, we will select the table where we want to save the data to, which in our example is the Inventario table we created previously:

Connection manager excel to csv

And as the last step, in the “Mappings” section, we will map the columns imported with our flow with the table columns:

Mappings excel to csv

So this would be the final image of what our Data Flow should look like:

Data flow excel to csv

And this would be the true image of what our Control Flow should look like:

Control Flow excel to csv

At the very beginning, we added an “Execute SQL Task” that contains a truncated version of the Inventario table.

And now everything should be ready to run our package and check the results:

Control Flow ejecutado excel to csv

As you can see, the csv files have been created to our directory:

And subsequently, these files were uploaded to our table:

Now you know how important is to change Excel to CSV format for issues with SSIS.

If you want more information about SSIS, here you get more information about that where one of our experts addresses the issue in greater depth.

We hope it helps you and if you want to keep getting advices and tricks about this subject, SolidQ invites you to subscribe to our newsletter

Measure the Strength of Association Between Two Categorical Variables: Mosaic Plot and Chi-Square Test

In a Data Science project it’s really important to get the more insights out of your data. There is a specific phase, the first one in the project, that has the data analysis as goal: the Data Exploration phase.

Among other kinds of analysis, one of the most interesting is the bi-variate one, that finds out the relationship between two variables. If the two variables are categorical, the most common plot used to analyze their relationship is the mosaic plot. At first sight it may appear a little bit confusing. People not aware of some statistical concepts can miss important information this plot can give us. So, we’ll go a little bit deeper in these concepts.

Read the rest of the article here.

Power BI (Audit) Usage Analytics; Get the most out of your Dashboards!

Power BI (Audit) Usage Analytics; Get the most out of your Dashboards!

Am I getting the most out of my Dashboards?

We already know that when a company has to decide whether to invest in a Business Intelligence project, it has to find the answers to all the questions that arise about its effectiveness: Are we really going to get anything out of it? Will it give us the information we need? Will it be beneficial for us? In many cases, it is difficult for companies to have the answer to all these questions, especially when we are in the early stages of the project. (more…)

Get data in Power BI from Oracle Database

Get data in Power BI from Oracle Database

Microsoft’s Business Analytics Service Power BI, enables us to connect to hundreds of data sources and produce beautiful reports that can be consumed on the web and across mobile devices, in order to deliver insights throughout our entire organization. So,in this post, I will walk you through the process to get data in Power BI from Oracle Database.

When you open Power BI Desktop, you will see the following window: (more…)

How to extract Twitter data with a windows service created with Python in Visual Studio 2017

How to extract Twitter data with a windows service created with Python in Visual Studio 2017

Hi everyone,

In this post, we will code a script in python (with Visual Studio 2017) to create a program which we can execute as a windows service in order to extract (in almost real time) the tweets related to certain words or hashtags, store them in a SQL server database, and then consume them with Power BI. (more…)

How to refresh Power BI dataset from an on-premise Power Shell script

How to refresh Power BI dataset from an on-premise Power Shell script

Hi friends,

Today we will show you how we can refresh a dataset published in Power BI from a Power Shell Script that we would invoke at the end of our ETL process.

We will use the Power BI libraries for power shell to connect to our power Bi portal and send an instruction to refresh a data set. This could be useful to improve our ETL processes, refreshing our on-line datasets used in Power Bi portal before loading data into our data-warehouse and/or our OLAP/Tabular database send an instruction to. (more…)