In a Data Science project it’s really important to get the more insights out of your data. There is a specific phase, the first one in the project, that has the data analysis as goal: the Data Exploration phase.
Among other kinds of analysis, one of the most interesting is the bi-variate one, that finds out the relationship between two variables. If the two variables are categorical, the most common plot used to analyze their relationship is the mosaic plot. At first sight it may appear a little bit confusing. People not aware of some statistical concepts can miss important information this plot can give us. So, we’ll go a little bit deeper in these concepts.
Most of the content of this post is platform-agnostic. Since in these days I’m using Azure Machine Learning, I take it as a starting point of my studies.
It’s quite simple for an Azure Machine Learning average user to create a regression experiment, make the data flow in it and get the predicted values. It’s also easy to have some metrics to evaluate the implemented model. Once you get them, the following questions arise: (more…)
today I want to tackle the issue of bulk copying more than one Azure ML experiment at once between different workspaces.
May be you already know that you can partially solve this task by copying an experiment one at a time. But you have to access to both the workspaces with your user. If not, you can simply share a workspace in this way:
Once you can see both the workspaces in your Azure Machine Learning Studio, you can simply select an experiment and than “Copy to workspace”:
and than you can choose the destination workspace:
A you can imagine… you can’t simply select more than one experiment and than copy all them:
Now suppose you have dozens of experiments and simply you don’t want to waste your time coping them all manually, or moreover you can’t have access to a shared workspace for security reasons. Is there a way to bulk copy your experiments? I’ll show you how to do that using few rows of PowerShell.
The SQL Server Partition Management Utility (http://sqlpartitionmgmt.codeplex.com/) is one of the best tool used to manage the partition-switch operations. It is a command line tool and can be integrated in a SSIS package or used to generate the T-SQL scripts needed in a regular “sliding window” partition management scenario. A blog post that shows how to use this tool is this one.
In my case, I wanted to speed the loading of a big partitioned fact table through a SSIS package (that calls two child packages). So this package calls more instances of the tool in order to load more than one staging table in parallel. Each staging table is related to a fact table partition. After each staging table is loaded, the SSIS package loads the target fact table using the partition-switch operations against the staging table.
All seemed to work fine, but during the test phase, when I tried to increase the degree of parallelism (that is the number of executed instances of the tool), I got a deadlock error.