When talking about technologies, speed and capacity in the transfer of data plays a fundamental role. That is why two of the most important technology companies in the world, such as Microsoft and Facebook, along with Telxius, a telephone infrastructure subsidiary, have inaugurated the most advanced transatlantic cable so far. The result of this ambitious project has been named Marea (Spanish for “Tide”) and its main aim is to satisfy the growing demand for Internet services and cloud computing in recent years. Marea is the most advanced subsea cable that crosses the Atlantic Ocean. The cable connects the United States (Virginia Beach) with Spain (Bilbao).
Most of the content of this post is platform-agnostic. Since in these days I’m using Azure Machine Learning, I take it as a starting point of my studies.
It’s quite simple for an Azure Machine Learning average user to create a regression experiment, make the data flow in it and get the predicted values. It’s also easy to have some metrics to evaluate the implemented model. Once you get them, the following questions arise:
- How can I interpret these numbers?
- Are these metrics enough to assess the goodness-of-fit of the model?
This post wants to provide you with the statistical foundation behind these metrics and with some additional tools that will help you to better understand how the model has fitted. These tools are implemented in a R script you can simply copy&paste into an Execute R Script module.
Read the rest of the article here:
This post is the quick and shorter version of a presentation I gave on the topic of batch execution for Azure ML through SSIS. The material is available here: http://www.sqlsaturday.com/433/Sessions/Details.aspx?sid=38704
Predictive experiments in Azure ML work as a wrapper for your trained model in terms of providing web service inputs and outputs. As a developer implementing these services that is about as much you need to know, we are going to sample some data, send it to the model and get some scores back and then load the new information back into a DW.
This is what the design of a typical predictive experiment looks like. On the left is the finished and polished machine learning model. From the top comes information about what the metadata of the webservice input should look like and we have the scoring activity in the middle.
The finished product will be a package as simple as in the introduction. Load some data to Azure blob storage, create and execute a batch on that data, retrieve the output URI and load that information back into a database table.
This package can act as a template for future use. The components can be parameterized and with some effort you can get it to work with BIML.
Azure Machine Learning has two modes of running predictive experiments through the API.
- Request-Response Service (RRS) which takes the inputs for a single case and returns the predictions, useful for streaming data.
- Batch Execution Service (BES) which we will explore here since SQL Server data and SSIS is inherently batch oriented.
When running a batch execution for an Azure ML prediction experiment it is important to remember that the only source you can use is an Azure Blob file, typically a csv file. Therefor we must first make sure our data is in the correct format before running the batch and we need to make sure we know where to look for the results which also to no surprise is a csv file in Azure Blob Storage.
First thing is first.
Our control flow kicks of with a data flow that takes our data, preferably in a view with correct formatting and filtering applied, and puts it into a csv file in Azure Blob Storage.
The connection manager for our DW is pretty straight forward but for the blob storage we are going to need that Azure Feature Pack I was telling you about. Make sure that you have properly installed the pack by looking for the Azure Blob Destination among the destinations in the SSIS Toolbox in Visual Studio.
Make sure you have your connection information available. You can use the same blob storage account that Azure ML uses, just make sure you have the storage name and access key which you can get by drilling down on the Azure Portal. You probably want to create a container for the csv files you are going to upload as input for the batch executions.
With connection managers in place we can now in our dataflow add the source and destinations, hooking each up to respective connections, adding information about database in table/view for data in the source and information about container/filename in the destination. These things are great to parameterize!
If you are unsure of where to find your blob storage information head over to manage.windowsazure.com and make sure you have a storage account set up. Then find the link to the service endpoint and one of your access keys.
Second part is to run the batch job. To get our batch job up and running all we need to do is to create an execution, point it to where the source input is located, and then start the job.
Well, it turns out that is easier said than done. The inspiration for this blog post came from another post by Chris Price over here: http://blogs.msdn.com/b/bluewatersql/archive/2015/04/04/building-an-azure-ml-ssis-task.aspx
What you need to do is clone this, build it, and distribute the assembly both for IDE and for execution (including any other servers where you want the packages to run). The component will show up in your SSIS toolbox in the control flow when properly installed.
I augmented some of his code to get another output method, just the uri of the experiment output.
I also changed so that the component uses a storage manager from the Azure Feature Pack instead of the connection manager from the other project in Chris’s solution.
You want to keep track of at least the URL and key for the predictive experiment. You might also want to parameterize the source blob location.
With this in place we can send the component information about where to get web service input for our experiment, and which variable to update with the location of the webservice output from the experiment.
So we add a variable for the location and split this up into two variables that the Azure Blob Source in our last data flow needs. This is because that component needs container and file name, not one long path.
This leaves us with the task of getting the data back.
Our second and last data flow task will read data from the Azure Storage where the experiment output landed and make sure the output data format matches the table in our DW. I also put in a derived column that makes the output a boolean depending on the the value of the experiment outputs which was a string for the label and a numeric for the score.
In summary, we have looked at how the Azure ML API works and how we can use it from SSIS. The big challange is in controlling and keeping track of the batch execution job while it is running.
If you are more serious about using batch execution I recommend using SSIS Data Factory.
There is much focus from Microsoft on Azure right now, but Azure is reachable via Internet only. A typical company on the other hand, has all its resources on an internal LAN which is shielded from Internet for security reasons. So how can you connect the two and integrate Azure with your local network? (more…)
There have been a lot of changes in and around Microsoft in the past few months. Under the leadership of Satya Nadella, the push toward “cloud first” and “mobile first” services, things are stirred-up a bit right now. I have to admit that having worked with consulting clients who can’t, aren’t ready or just don’t want to move to the cloud; I’ve gotten a little caught up asking why it’s all necessary. The short answer is that cloud services are a reality. Just about everyone does business on the Internet in some way and, for technology product and service providers like Microsoft, it’s faster and more efficient to deliver to the cloud before shrink-wrapping boxes of software to be installed on traditional servers and desktop computers. (more…)
This week Microsoft announced the availability of Power BI Dashboards and the browser-based dashboard designer. What is it and why is it important? The most significant thing about it is that report and dashboard users do not need to have Excel 2013 ProPlus edition or an Office 365 subscription to use Power BI. This is very good news as it opens these amazing capabilities up to a much wider audience; those who work for companies that don’t have Office 2013 ProPlus or who are not using Office 365 ProPlus. Power BI updates and new features are being released at a very fast pace and there is much to consider. The definition of “Power BI” and Microsoft’s larger Business Intelligence and data analytics offering continues to evolve. First, exactly what’s new and recently available? (more…)
Great news yesterday and today the whole day after our Spanish SolidQ Summit 2012 in Madrid in Windows Azure with the new version of the Microsoft’s Cloud Platform full of new features and new names for the big amount of cloud services.
This article will review all Enterprise Search features, giving some examples. This will show all employees the value of using the Enterprise Search system in their organization. We will also explain each Search product individually, and cover each Enterprise Search feature. (more…)
It seems a recurrent topic this week in the SQL Azure forums, how do I remove an Azure Data Sync installation so I stop my syncing process and could delete all the Data Sync stuff (tables, stored procedures and so on) from my local databases?
Import and Export (CTP) is an interesting SQL Azure feature that allows us to Export a SQL Azure database in form of a bacpac to Azure storage. And think of bacpac as the “zipped” version of schema and data in your database. It also allows us to import the bacpac to a SQL Azure database. Also note that Import and Export CTP works also with on premise SQL server database and you can download the necessary bits and information here: http://sqldacexamples.codeplex.com/releases/view/72388