In a Data Science project it’s really important to get the more insights out of your data. There is a specific phase, the first one in the project, that has the data analysis as goal: the Data Exploration phase.
Among other kinds of analysis, one of the most interesting is the bi-variate one, that finds out the relationship between two variables. If the two variables are categorical, the most common plot used to analyze their relationship is the mosaic plot. At first sight it may appear a little bit confusing. People not aware of some statistical concepts can miss important information this plot can give us. So, we’ll go a little bit deeper in these concepts.
Read the rest of the article here.
Am I getting the most out of my Dashboards?
We already know that when a company has to decide whether to invest in a Business Intelligence project, it has to find the answers to all the questions that arise about its effectiveness: Are we really going to get anything out of it? Will it give us the information we need? Will it be beneficial for us? In many cases, it is difficult for companies to have the answer to all these questions, especially when we are in the early stages of the project. (more…)
The increasing diversification of the type and volume of data, the lowering of computational processing costs and storage costs have opened a window of opportunity for the resurgence of a discipline that already existed on paper and among equations: Machine Learning.
What is Machine Learning?
Microsoft’s Business Analytics Service Power BI, enables us to connect to hundreds of data sources and produce beautiful reports that can be consumed on the web and across mobile devices, in order to deliver insights throughout our entire organization. In this post, I will walk you through the process to get data in Power BI from Oracle Database.
When you open Power BI Desktop, you will see the following window: (more…)
In this post, we will code a script in python (with Visual Studio 2017) to create a program which we can execute as a windows service in order to extract (in almost real time) the tweets related to certain words or hashtags, store them in a SQL server database, and then consume them with Power BI. (more…)
Today we will show you how we can refresh a dataset published in Power BI from a Power Shell Script that we would invoke at the end of our ETL process.
We will use the Power BI libraries for power shell to connect to our power Bi portal and send an instruction to refresh a data set. This could be useful to improve our ETL processes, refreshing our on-line datasets used in Power Bi portal before loading data into our data-warehouse and/or our OLAP/Tabular database send an instruction to. (more…)
In this post, we will talk about the new property that appears in the SQL 2016 version: Auto Adjust Buffer Size. This attribute is specific to the “DataFlow” component and can take the values of ‘True’ or ‘False’ (default). Furthermore, we will propose an approximate solution for previous versions.
As you know, as long as you are not totally oblivious to the technological world you will have heard about one of the biggest bugs in the history of computer science (Spectre and Meltdown) and that its effects are real. So real, that we ourselves at SolidQ ourselves have experienced it in our own Query Analytics software. In this post I will try to shed some light on how to proceed if you detect performance regression in your solution with SQL Server, explaining how I have solved it in my own system.
I have uploaded a repository containing a helper to analyse the Data Migration Assistant tool results to GitHub. I have compiled and improved Microsoft’s version released last March in order to process aggregated results from multiple servers using Microsoft’s static code analysis tool. (more…)
Most of the content of this post is platform-agnostic. Since in these days I’m using Azure Machine Learning, I take it as a starting point of my studies.
It’s quite simple for an Azure Machine Learning average user to create a regression experiment, make the data flow in it and get the predicted values. It’s also easy to have some metrics to evaluate the implemented model. Once you get them, the following questions arise: (more…)