Measure the Strength of Association Between Two Categorical Variables: Mosaic Plot and Chi-Square Test

In a Data Science project it’s really important to get the more insights out of your data. There is a specific phase, the first one in the project, that has the data analysis as goal: the Data Exploration phase.

Among other kinds of analysis, one of the most interesting is the bi-variate one, that finds out the relationship between two variables. If the two variables are categorical, the most common plot used to analyze their relationship is the mosaic plot. At first sight it may appear a little bit confusing. People not aware of some statistical concepts can miss important information this plot can give us. So, we’ll go a little bit deeper in these concepts.

Read the rest of the article here.

Python for SQL Server Specialists Part 4: Python and SQL Server

Python for SQL Server Specialists Part 4: Python and SQL Server

In the last article in this series about Python for SQL Server specialists, you are going to learn how to use SQL Server Python libraries in SQL Server. You can use two scalable libraries, the revoscalepy and microsoftml libraries, which correspond to equivalent R libraries.

SQL Server and ML Integration

With SQL Server 2016 and 2017, you get a highly scalable Machine Learning (ML) engine. Not every function and algorithm is rewritten as a scalable one. Nevertheless, you will probably find the one you need for your analysis of a big dataset. You can store a Python or R data mining or machine learning model in a SQL Server table and use it for predictions on new data. You can even store graphs in a binary column and use it in SQL Server Reporting Services (SSRS) reports. Finally, ML support is not limited to SQL Server only. You can use R code also in Power BI Desktop and Power BI Service, where we at this time (April 2018) still waiting for Python support. However, you can already use both languages, Python and R, in Azure Machine Learning (Azure ML) experiments. (more…)

Python for SQL Server Specialists Part 2: Working with Data

Python for SQL Server Specialists Part 2: Working with Data

In my previous article, you learned Python fundamentals. I also introduced the basic data structures. You can imagine you need more advanced data structures for analyzing SQL Server data, which comes in tabular format. In Python, there is also the data frame object, like in R. It is defined in the pandas library. You communicate with SQL Server through the pandas data frames. But before getting there, you need first to learn about arrays and other objects from the numpy library.

In this article, you will learn about the objects from the two of the most important Python libraries, namely, as mentioned, numpy and pandas. (more…)

Python for SQL Server Specialists Part 1: Introducing Python

Python for SQL Server Specialists Part 1: Introducing Python

Python is one of the most popular programming languages. It is a general purpose high level language. It was created by Guido van Rossum, publicly released in 1991. SQL Server 2016 started to support R, and SQL Server 2017 adds support for Python. Now you can select your preferred language for the data science and even other tasks. R has even more statistical, data mining and machine learning libraries, because it is more widely used in the data science community; however, Python has broader purpose than just data science, and is more readable and might thus be simpler to learn. This is the first of the four articles that introduce Python to SQL Server developers and business intelligence (BI) specialists. This means that the articles are more focused on Python basics and data science, and less on general programming with Python.

(more…)