How to Better Evaluate the Goodness-of-Fit of Regressions

Most of the content of this post is platform-agnostic. Since in these days I’m using Azure Machine Learning, I take it as a starting point of my studies.

It’s quite simple for an Azure Machine Learning average user to create a regression experiment, make the data flow in it and get the predicted values. It’s also easy to have some metrics to evaluate the implemented model. Once you get them, the following questions arise:

  • How can I interpret these numbers?
  • Are these metrics enough to assess the goodness-of-fit of the model?

This post wants to provide you with the statistical foundation behind these metrics and with some additional tools that will help you to better understand how the model has fitted. These tools are implemented in a R script you can simply copy&paste into an Execute R Script module.

Read the rest of the article here:

Azure ML through the eyes of an SSIS developer

Azure ML through the eyes of an SSIS developer

This post is the quick and shorter version of a presentation I gave on the topic of batch execution for Azure ML through SSIS. The material is available here:

Predictive experiments in Azure ML work as a wrapper for your trained model in terms of providing web service inputs and outputs. As a developer implementing these services that is about as much you need to know, we are going to sample some data, send it to the model and get some scores back and then load the new information back into a DW.

This is what the design of a typical predictive experiment looks like. On the left is the finished and polished machine learning model. From the top comes information about what the metadata of the webservice input should look like and we have the scoring activity in the middle.


If you want to follow along at home, you will need the following.

The finished product will be a package as simple as in the introduction. Load some data to Azure blob storage, create and execute a batch on that data, retrieve the output URI and load that information back into a database table.

This package can act as a template for future use. The components can be parameterized and with some effort you can get it to work with BIML.

The control flow of the SSIS package.

Azure Machine Learning has two modes of running predictive experiments through the API.

  • Request-Response Service (RRS) which takes the inputs for a single case and returns the predictions, useful for streaming data.
  • Batch Execution Service (BES) which we will explore here since SQL Server data and SSIS is inherently batch oriented.

When running a batch execution for an Azure ML prediction experiment it is important to remember that the only source you can use is an Azure Blob file, typically a csv file. Therefor we must first make sure our data is in the correct format before running the batch and we need to make sure we know where to look for the results which also to no surprise is a csv file in Azure Blob Storage.

First thing is first.

Our control flow kicks of with a data flow that takes our data, preferably in a view with correct formatting and filtering applied, and puts it into a csv file in Azure Blob Storage.

The connection manager for our DW is pretty straight forward but for the blob storage we are going to need that Azure Feature Pack I was telling you about. Make sure that you have properly installed the pack by looking for the Azure Blob Destination among the destinations in the SSIS Toolbox in Visual Studio.

Make sure you have your connection information available. You can use the same blob storage account that Azure ML uses, just make sure you have the storage name and access key which you can get by drilling down on the Azure Portal. You probably want to create a container for the csv files you are going to upload as input for the batch executions.

With connection managers in place we can now in our dataflow add the source and destinations, hooking each up to respective connections, adding information about database in table/view for data in the source and information about container/filename in the destination. These things are great to parameterize!

If you are unsure of where to find your blob storage information head over to and make sure you have a storage account set up. Then find the link to the service endpoint and one of your access keys.

Second part is to run the batch job. To get our batch job up and running all we need to do is to create an execution, point it to where the source input is located, and then start the job.
Well, it turns out that is easier said than done. The inspiration for this blog post came from another post by Chris Price over here:

What you need to do is clone this, build it, and distribute the assembly both for IDE and for execution (including any other servers where you want the packages to run). The component will show up in your SSIS toolbox in the control flow when properly installed.

I augmented some of his code to get another output method, just the uri of the experiment output.
I also changed so that the component uses a storage manager from the Azure Feature Pack instead of the connection manager from the other project in Chris’s solution.
You want to keep track of at least the URL and key for the predictive experiment. You might also want to parameterize the source blob location.

With this in place we can send the component information about where to get web service input for our experiment, and which variable to update with the location of the webservice output from the experiment.

So we add a variable for the location and split this up into two variables that the Azure Blob Source in our last data flow needs. This is because that component needs container and file name, not one long path.

This leaves us with the task of getting the data back.

Our second and last data flow task will read data from the Azure Storage where the experiment output landed and make sure the output data format matches the table in our DW. I also put in a derived column that makes the output a boolean depending on the the value of the experiment outputs which was a string for the label and a numeric for the score.

In summary, we have looked at how the Azure ML API works and how we can use it from SSIS. The big challange is in controlling and keeping track of the batch execution job while it is running.

If you are more serious about using batch execution I recommend using SSIS Data Factory.

Connect Azure to your local network

Connect Azure to your local network

There is much focus from Microsoft on Azure right now, but Azure is reachable via Internet only. A typical company on the other hand, has all its resources on an internal LAN which is shielded from Internet for security reasons. So how can you connect the two and integrate Azure with your local network? (more…)

Some Thoughts About Microsoft’s Cloud Platform Roadmap

There have been a lot of changes in and around Microsoft in the past few months.  Under the leadership of Satya Nadella, the push toward “cloud first” and “mobile first” services, things are stirred-up a bit right now.  I have to admit that having worked with consulting clients who can’t, aren’t ready or just don’t want to move to the cloud; I’ve gotten a little caught up asking why it’s all necessary.  The short answer is that cloud services are a reality.  Just about everyone does business on the Internet in some way and, for technology product and service providers like Microsoft, it’s faster and more efficient to deliver to the cloud before shrink-wrapping boxes of software to be installed on traditional servers and desktop computers. (more…)

New Power BI stand-alone Designer, Dashboards, APIs & iOS App

New Power BI stand-alone Designer, Dashboards, APIs & iOS App

This week Microsoft announced the availability of Power BI Dashboards and the browser-based dashboard designer.  What is it and why is it important?  The most significant thing about it is that report and dashboard users do not need to have Excel 2013 ProPlus edition or an Office 365 subscription to use Power BI.  This is very good news as it opens these amazing capabilities up to a much wider audience; those who work for companies that don’t have Office 2013 ProPlus or who are not using Office 365 ProPlus.  Power BI updates and new features are being released at a very fast pace and there is much to consider.  The definition of “Power BI” and Microsoft’s larger Business Intelligence and data analytics offering continues to evolve. First, exactly what’s new and recently available? (more…)

Step by Step guide to Export a SQL Azure Database to Azure storage via Import and Export CTP

Step by Step guide to Export a SQL Azure Database to Azure storage via Import and Export CTP

Import and Export (CTP) is an interesting SQL Azure feature that allows us to Export a SQL Azure database in form of a bacpac to Azure storage. And think of bacpac as the “zipped” version of schema and data in your database. It also allows us to import the bacpac to a SQL Azure database.  Also note that Import and Export CTP works also with on premise SQL server database and you can download the necessary bits and information here:


Testing latency between client and SQL Azure via client statistics in SSMS

Testing latency between client and SQL Azure via client statistics in SSMS

As I write this blog post, There are six location options while provisioning a SQL Azure server. And so while provisioning a SQL Azure server, who may have to decide the optimal location of the SQL Azure server based on the criteria that the latency between your application and the SQL Azure server is the minimum. And as you may know, we get better performance – if we are are able to minimize the latency between client and SQL Azure. So let’s get into action.