In a Data Science project it’s really important to get the more insights out of your data. There is a specific phase, the first one in the project, that has the data analysis as goal: the Data Exploration phase.
Among other kinds of analysis, one of the most interesting is the bi-variate one, that finds out the relationship between two variables. If the two variables are categorical, the most common plot used to analyze their relationship is the mosaic plot. At first sight it may appear a little bit confusing. People not aware of some statistical concepts can miss important information this plot can give us. So, we’ll go a little bit deeper in these concepts.
Read the rest of the article here.
Most of the content of this post is platform-agnostic. Since in these days I’m using Azure Machine Learning, I take it as a starting point of my studies.
It’s quite simple for an Azure Machine Learning average user to create a regression experiment, make the data flow in it and get the predicted values. It’s also easy to have some metrics to evaluate the implemented model. Once you get them, the following questions arise:
- How can I interpret these numbers?
- Are these metrics enough to assess the goodness-of-fit of the model?
This post wants to provide you with the statistical foundation behind these metrics and with some additional tools that will help you to better understand how the model has fitted. These tools are implemented in a R script you can simply copy&paste into an Execute R Script module.
Read the rest of the article here: