Creating an installer for your SSIS-assemblies

Creating an installer for your SSIS-assemblies

When you develop SSIS packages with custom functionality, either self signed assemblies or script components/script transformations that implement assemblies you got from codeplex, it sometimes become a nuisance to get them into GAC on production machines. What often happens is that when you try to run the package in earnest via the SQL Agent, it fails to find the assembly.

In my case I’ll be adding ionic.zip to my SSIS package dataflow in order to stream data into encrypted zip files.
If you would like to try out a cool way to deal with zip files in dotnet, I recommend it: DotNetZip – Zip and Unzip in C#, VB, any .NET language

So back to trying to run custom components and assemblies in SSIS packages. I bet you have seen something like this at some point in your career:

Error: System.IO.FileNotFoundException: Could not load file or assembly ‘Ionic.Zip, Version=1.9.1.8, Culture=neutral, PublicKeyToken=edbe51ad942a3f5c’ or one of its dependencies. The system cannot find the file specified.
File name: ‘Ionic.Zip, Version=1.9.1.8, Culture=neutral, PublicKeyToken=edbe51ad942a3f5c’

This tells us that our dll can’t be found when the SISS execution reaches the dataflow that uses it.

I wanted to find a solution that helps me deal with this in a good manner that is pretty future proof.
The first thing that came to mind is doing what I always recommend: use powershell. The problem is that sometimes you don’t have powershell, or an older version or aren’t allowed to run what scripts you want in production.

The next thing that was suggested to me was to copy gacutil.exe over to the production environment and do my deployment that way. That is not recommended though as it violates the license of Visual Studio SDK.

I tried out a couple of different plug ins for Visual Studio to be able to create an installer and the one that worked with the least hassle for me was WiX.

These are the steps I took to get my first working installer that added dll-files to gac:

  • Make sure that your assembly is signed. If you are unsure about whether your own assembly is signed, then it probably isn’t. Since the example is of someone else’s assembly then it probably is signed.
  • Make sure that your installation of Visual Studio is compatible with WiX (I use Visual Studio 2015 Developer)
  • Download the latest stable release of the WiX toolset that works with your version of VS.
  • Put the assembly (dll file) in any folder on your computer and have the corresponding path at hand. It doesn’t matter where the file is as long as you can access it when building the project.
  • Create a new VS project for WiX as shown in this picture: create wix project in VS
  • Put an XML fragment into Project.wxs that points to your file:the important stuff is now in the xml file
  • Set the project to release mode, build it, and go to the bin file to take a look.
  • The three files are all needed for your installer, so zip them together and you are basically done: set build to release, open the bin folder and look for three files - add them to a zip and you got your installer
  • Move your zip file with the three files to the machine you need the DLL on and run the installer. If you want a prettier installer with more options and so on, put more effort into the WiX project!

Here is the excerpt of the xml that worked for me, mileage might vary so look up the documentation and the comment sections of other articles you find.

<Fragment>
  <ComponentGroup Id="ProductComponents" Directory="INSTALLFOLDER">
<Component Id="Ionic.Zip">
  <File Name="Ionic.Zip.dll" KeyPath="yes" Source="C:\Import\Ionic.Zip.dll" Assembly=".net" AssemblyManifest="Ionic.Zip.dll"/>
</Component>
  </ComponentGroup>
</Fragment>

 

That time I used ON UPDATE CASCADE in production

Backstory


While working with a invoicing system back in 2013 I was asked by my customer if it would be possible to enable them to have some more special handling of their biggest client. The way their system was set up, one client accounted for over half of the accounts receivables. The client and my customer were also very close in their operations. What they needed from the system was the ability to split the client into four legal entities while remaining all the custom logic and prices. They asked me if a quick-fix was possible.

“A quick-fix solution is as fast, as easy and as well-documented as its explanation from the customer” – me just now.

The system had a procedure which would work out all the numbers for each client and for this particular one handle subsidiaries together on all invoices. With this reorganization coming, all units would be connected to one of four new client subsidiaries and each subsidiary would instead receive its own invoices for its units.

 

What the options for foreign key constraints do


You can add two clauses (ON UPDATE and ON DELETE) when creating a foreign key constraint. Both can have one of four options.

NO ACTION – this is standard, an error is raised and the execution halts if the new value would violate referential integrity.
CASCADE – data integrity is conserved by updating or deleting records in the related tables in the same transaction. In other words completely hazardous when dealing with deletes. Use with caution.
SET NULL – If the relation between records is broken, the integrity of the referencing record is upheld by setting the value of the foreign key column to NULL.
SET DEFAULT – Same as above except the value becomes the default for the column instead of NULL, there has to of course be a default value for the column for this to work.

From rough draft to the final solution


On the back of an envelope I wrote down these nine steps that would re-route the client logic and make sure everything would still sum up nicely:

  1. Begin a transaction and run the stored procedures to create the invoices as normal
  2. For each unit represented for the four new subsidiaries: create a new invoice header with a temporary and made up sequence starting from -1 and going backwards to avoid collisions.
  3. For each invoice line: map invoiceId by subsidiary (-1 through -4)
  4. Drop the now empty and unused invoice headers created in step 1.
  5. Drop the foreign key constraint between headers and lines
  6. Re-create this foreign key but set it with action ON UPDATE CASCADE
  7. Re-run the stored procedure that fixes invoiceIds for headers, this procedure is a sub-procedure when creating invoices. When the headers are updated, the lines inherit the information through our foreign key constraint.
  8. Drop and re-create the foreign key once again as it existed before the operation.
  9. Commit transaction. Done! No lines removed, all accounted for and all connected nicely.

 

Azure ML through the eyes of an SSIS developer

Azure ML through the eyes of an SSIS developer

This post is the quick and shorter version of a presentation I gave on the topic of batch execution for Azure ML through SSIS. The material is available here: http://www.sqlsaturday.com/433/Sessions/Details.aspx?sid=38704

Predictive experiments in Azure ML work as a wrapper for your trained model in terms of providing web service inputs and outputs. As a developer implementing these services that is about as much you need to know, we are going to sample some data, send it to the model and get some scores back and then load the new information back into a DW.

This is what the design of a typical predictive experiment looks like. On the left is the finished and polished machine learning model. From the top comes information about what the metadata of the webservice input should look like and we have the scoring activity in the middle.

 

If you want to follow along at home, you will need the following.

The finished product will be a package as simple as in the introduction. Load some data to Azure blob storage, create and execute a batch on that data, retrieve the output URI and load that information back into a database table.

This package can act as a template for future use. The components can be parameterized and with some effort you can get it to work with BIML.

The control flow of the SSIS package.

Azure Machine Learning has two modes of running predictive experiments through the API.

  • Request-Response Service (RRS) which takes the inputs for a single case and returns the predictions, useful for streaming data.
  • Batch Execution Service (BES) which we will explore here since SQL Server data and SSIS is inherently batch oriented.

When running a batch execution for an Azure ML prediction experiment it is important to remember that the only source you can use is an Azure Blob file, typically a csv file. Therefor we must first make sure our data is in the correct format before running the batch and we need to make sure we know where to look for the results which also to no surprise is a csv file in Azure Blob Storage.

First thing is first.

Our control flow kicks of with a data flow that takes our data, preferably in a view with correct formatting and filtering applied, and puts it into a csv file in Azure Blob Storage.

The connection manager for our DW is pretty straight forward but for the blob storage we are going to need that Azure Feature Pack I was telling you about. Make sure that you have properly installed the pack by looking for the Azure Blob Destination among the destinations in the SSIS Toolbox in Visual Studio.

Make sure you have your connection information available. You can use the same blob storage account that Azure ML uses, just make sure you have the storage name and access key which you can get by drilling down on the Azure Portal. You probably want to create a container for the csv files you are going to upload as input for the batch executions.

With connection managers in place we can now in our dataflow add the source and destinations, hooking each up to respective connections, adding information about database in table/view for data in the source and information about container/filename in the destination. These things are great to parameterize!

If you are unsure of where to find your blob storage information head over to manage.windowsazure.com and make sure you have a storage account set up. Then find the link to the service endpoint and one of your access keys.

Second part is to run the batch job. To get our batch job up and running all we need to do is to create an execution, point it to where the source input is located, and then start the job.
Well, it turns out that is easier said than done. The inspiration for this blog post came from another post by Chris Price over here: http://blogs.msdn.com/b/bluewatersql/archive/2015/04/04/building-an-azure-ml-ssis-task.aspx

What you need to do is clone this, build it, and distribute the assembly both for IDE and for execution (including any other servers where you want the packages to run). The component will show up in your SSIS toolbox in the control flow when properly installed.

I augmented some of his code to get another output method, just the uri of the experiment output.
I also changed so that the component uses a storage manager from the Azure Feature Pack instead of the connection manager from the other project in Chris’s solution.
You want to keep track of at least the URL and key for the predictive experiment. You might also want to parameterize the source blob location.

With this in place we can send the component information about where to get web service input for our experiment, and which variable to update with the location of the webservice output from the experiment.

So we add a variable for the location and split this up into two variables that the Azure Blob Source in our last data flow needs. This is because that component needs container and file name, not one long path.

This leaves us with the task of getting the data back.

Our second and last data flow task will read data from the Azure Storage where the experiment output landed and make sure the output data format matches the table in our DW. I also put in a derived column that makes the output a boolean depending on the the value of the experiment outputs which was a string for the label and a numeric for the score.

In summary, we have looked at how the Azure ML API works and how we can use it from SSIS. The big challange is in controlling and keeping track of the batch execution job while it is running.

If you are more serious about using batch execution I recommend using SSIS Data Factory.