Python for SQL Server Specialists Part 1: Introducing Python

Python for SQL Server Specialists Part 1: Introducing Python

Python is one of the most popular programming languages. It is a general purpose high level language. It was created by Guido van Rossum, publicly released in 1991. SQL Server 2016 started to support R, and SQL Server 2017 adds support for Python. Now you can select your preferred language for the data science and even other tasks. R has even more statistical, data mining and machine learning libraries, because it is more widely used in the data science community; however, Python has broader purpose than just data science, and is more readable and might thus be simpler to learn. This is the first of the four articles that introduce Python to SQL Server developers and business intelligence (BI) specialists. This means that the articles are more focused on Python basics and data science, and less on general programming with Python.


Starting with Python

Python is an interpreted language. The philosophy of the language is about the code readability. For example, you use white spaces to delimit code blocks instead of special characters like semicolon or curly brackets. Python supports automatic memory management. It has a dynamic type system. You can use multiple program paradigms in Python, including procedural, object-oriented, and functional programming. You can find Python interpreters for all major operating systems. The reference implementation of Python, namely CPython, is open source software, managed by the non-profit Python Software Foundation. Of course, being open source, also means that there is a reach set of libraries available. Even the standard library is impressive and comprehensive.

In order to start working with Python and R, you need to do some installation. I am not covering general SQL Server and Visual Studio installation, I am just explaining what you need to do to start using Python with SQL Server.


Installing ML Services and VS 2017 for Data Science

You just start SQL Server setup, and then from the Feature Selection page select Database Engine Services, and underneath Machine Learning (ML) Services (In-Database), with Python only, or both languages, R and Python, selected. After that, all you need are client tools, and you can start writing the code. The following figure shows the SQL Server setup Feature Selection page with appropriate features selected.

The next step is installing client tools. Of course, you need SQL Server Management Studio (SSMS). In addition, you might want to install Visual Studio (VS) 2017. You can use either Professional or even free Community edition to develop python (and also R) code.

When installing Visual Studio 2017, be sure to select Python development workload, and then Data science and analytical applications, like the following figure shows. This will install Python language templates, including data science templates, and also R Tools for Visual Studio.

Selecting the Appropriate Python Engine

There you go, you are nearly ready. There is a small trick here. VS 2017 installs also its own Python interpreter. In order to use the scalable, the one installed with SQL Server, the one that enables executing code in the Database Engine context and includes Microsoft scalable libraries, you need to setup an additional Python environment, pointing to the scalable version of the interpreter. The path for this scalable interpreter is, if you installed the default instance of SQL Server, C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\python.exe. You can see how to setup this environment in the following figure.

That’s it. You are ready to start Python programming. Just start a new project and select the Python Application template from the Python folder. You can also explore the Python Machine Learning templates, which include Classifier, Clustering, and Regression projects. If you selected the Python Application template, you should have open the first empty Python script with default name the same as the project name and default extension py, waiting for you to write and interactively execute Python code.

Python Language Basics

Python uses the hash mark for a comment. You can execute Python code in VS 2017 by highlighting the code and simultaneously pressing the Ctrl and Enter keys. You can use either single or double apostrophes for delimiting strings. The first command you will learn is the print() command. Write and execute the following code:

# Hash starts a comment
print("Hello World!")
# This line ignored
print('Printing again.')
print('O"Hara')   # In-line comment

You can observe the code you wrote and the results in the Interactive window, which is by default below the script window, at the bottom left side of the screen.

Python supports all basic mathematical and comparison operators, like you would expect. The following code introduces them. Note that you can combine strings and expressions in a single print() statement.

1 + 2
print("The result of 3 + 20 / 4 is:", 3 + 20 / 4)
10 * 2 - 7
10 % 4
print("Is 7 less or equal to 5?", 7 <= 5)
print("Is 7 greater than 5?", 7 > 5)

The next step is to introduce the variables. Note that Python is case-sensitive. The following code shows how you can assign values to variables and use them for direct computations and as the arguments of a function.

# Integer
a = 2
b = 3
a ** b
# Float
c = 7.0
d = float(5)
print(c, d)

You can define strings inside double or single quotes. This enables you to use single quotes inside a double-quoted string, and vice-versa. You can use the “%?” operator for formatting strings to include variables, where the question mark stands for a single letter denoting the data type of the variable, for example “s” for strings and “d” for numbers. The str.format() method of the string data type allows you to do variable substitutions in a string. Here are some examples.

e = "String 1"
f = 10
print("Let's concatenate string %s and number %d." % (e, f))
four_cb = "String {} {} {} {}"
print(four_cb.format(1, 2, 3, 4))

The result of the previous code is:

Let’s concatenate string String 1 and number 10.
String 1 2 3 4

You can also create multi-lines strings. Just enclose the strings in a pair of three double quotes. You can also use special characters, like tab and line feed. Escape them with a single backslash character plus a letter, for example letter t for a tab and letter n for a line feed.

You can always get interactive help with the help() command. A Python module is a file with default extension .py containing Python definitions and statements. You can import a module into your current script with the import command, and then use the functions and variables defined in that module. Besides modules provided with the installation, you can, of course, develop your own modules, distribute them, and reuse the code.

Using Functions, Branches, and Loops

Like in any serious programming language, you can encapsulate your code inside a function. You define a function with the def name(): command. Functions can use arguments. Functions can also return values. The following code defines two functions, one that has no arguments, and one that has two arguments and returns a value. Note that there is no special ending mark of a function body – the correct indentation tells the Python interpreter where the body of the first function ends, and the definition of the second function starts.

def p_n():
    print("No args...")
def add(a, b):
    return a + b

When you call a function, you can pass parameters as literals, or through variables. You can also do some manipulation with the variables when you pass them as the arguments to a function. The following code shows these possibilities.

# Call with variables and math
a = 10
b = 20
add(a / 5, b / 4)

You can make branches in the flow of your code with the if..elif..else: statement. The following code shows you and example.

a = 10
b = 20
c = 30
if a > b:
    print("a > b")
elif a > c:
    print("a > c")
elif (b < c):
    print("b < c")
    if a < c:
        print("a < c")
    if b in range(10, 30):
        print("b is between a and c")
    print("a is less than b and less than c")

The results of the code are:

b < c
a < c
b is between a and c

The simplest data structure is the list. Python list is a set of comma-separated values (or items) between square brackets. You can use a for or for each loop to iterate over a list. There are many methods supported by a list. For example, you can use the list.append() method to append an element to a list. The following code shows how to create lists and loop over them with the for and foreach loops. Finally, it shows a while loop.

animals = ["cat", "dog", "pig"]
nums = []
for animal in animals:
    print("Animal: ", animal)
for i in range(2, 5):
i = 1
while i <= 10:
    i = i + 1

The last data structure presented in this introduction article is the dictionary. A dictionary is a set of the key – value pairs. You can see an example of a dictionary in the following code.

states = {
    "Oregon": "OR",
    "Florida": "FL",
    "Michigan": "MI"}
for state, abbrev in list(states.items()):
    print("{} is abbreviated {}.".format(state, abbrev))

I mentioned that in Python you can also use object-oriented paradigm. However, going deeper with object-oriented programming with Python is beyond the scope of this and the following articles.


I guess the programming in Python introduced so far was not over exciting. However, you always need to start with basics, and only after you embrace the basics, the exciting part starts. Therefore, don’t miss my next article, when I will introduce the most important data structure for advanced analytics, the data frame structure.

Microsoft, Facebook and Telxius present the most advanced transatlantic subsea cable

Microsoft, Facebook and Telxius present the most advanced transatlantic subsea cable

When talking about technologies, speed and capacity in the transfer of data plays a fundamental role. That is why two of the most important technology companies in the world, such as Microsoft and Facebook, along with Telxius, a telephone infrastructure subsidiary, have inaugurated the most advanced transatlantic cable so far. The result of this ambitious project has been named Marea (Spanish for “Tide”) and its main aim is to satisfy the growing demand for Internet services and cloud computing in recent years. Marea is the most advanced subsea cable that crosses the Atlantic Ocean. The cable connects the United States (Virginia Beach) with Spain (Bilbao).


I use columnstore indexes on SQL 2014. What would be the advantages of upgrading to SQL 2016?

I use columnstore indexes on SQL 2014. What would be the advantages of upgrading to SQL 2016?

SQL Server 2016 really represents a drastic change for the better in regards to general performance. And the proof is that, thanks to the engine changes, this is the first SQL Server version where Microsoft actively provides a widespread increase in performance levels by simply upgrading. But what impact can it have in your daily loads if you use columnstore indexes in SQL server 2014 datawarehouse? There was already an amazing increase in performance levels with SQL Server 2014 and the columnstore indexes. Is it worth upgrading to SQL 2016? (more…)

Integration Services (SSIS) vs. POWERCENTER. ETL tool comparation (I)

Integration Services (SSIS) vs. POWERCENTER. ETL tool comparation (I)

Below is a comparison table between ETL tools: SQL Server Integration Services aka SSIS vs PowerCenter. What tool should I choose? What are the differences? What are the advantages and disadvantages? Do they meet my requirements and particular needs? These are some of the questions that you may have asked yourself at the start of any new project, and we will aim to answer them in this post by providing you with some key facts regarding two very potent tools: SQL Server Integration Services a.k.a. SSIS versus Informatica PowerCenter.


Virtual SANs and the trend towards SDS

Virtual SANs and the trend towards SDS

Lately, we are noticing a trend whereby traditional SANs are increasingly losing ground. This can be due to various reasons, such as forced migrations to the cloud, limited budget environments or even customers who are getting annoyed with their storage providers’ draconian terms and conditions. A common example is to find very costly maintenance agreements, higher-than-market disk size upgrade costs, as well as difficulties to expand/ improve hardware without having to completely replace it. There is also a strong trend to continue going virtual and trading servers like they were commodities. All the above is therefore contributing to an increase in this trend towards SDS, while the VSANs continue to attract new supporters.


Creating an installer for your SSIS-assemblies

Creating an installer for your SSIS-assemblies

When you develop SSIS packages with custom functionality, either self signed assemblies or script components/script transformations that implement assemblies you got from codeplex, it sometimes become a nuisance to get them into GAC on production machines. What often happens is that when you try to run the package in earnest via the SQL Agent, it fails to find the assembly.

In my case I’ll be adding to my SSIS package dataflow in order to stream data into encrypted zip files.
If you would like to try out a cool way to deal with zip files in dotnet, I recommend it: DotNetZip – Zip and Unzip in C#, VB, any .NET language

So back to trying to run custom components and assemblies in SSIS packages. I bet you have seen something like this at some point in your career:

Error: System.IO.FileNotFoundException: Could not load file or assembly ‘Ionic.Zip, Version=, Culture=neutral, PublicKeyToken=edbe51ad942a3f5c’ or one of its dependencies. The system cannot find the file specified.
File name: ‘Ionic.Zip, Version=, Culture=neutral, PublicKeyToken=edbe51ad942a3f5c’

This tells us that our dll can’t be found when the SISS execution reaches the dataflow that uses it.

I wanted to find a solution that helps me deal with this in a good manner that is pretty future proof.
The first thing that came to mind is doing what I always recommend: use powershell. The problem is that sometimes you don’t have powershell, or an older version or aren’t allowed to run what scripts you want in production.

The next thing that was suggested to me was to copy gacutil.exe over to the production environment and do my deployment that way. That is not recommended though as it violates the license of Visual Studio SDK.

I tried out a couple of different plug ins for Visual Studio to be able to create an installer and the one that worked with the least hassle for me was WiX.

These are the steps I took to get my first working installer that added dll-files to gac:

  • Make sure that your assembly is signed. If you are unsure about whether your own assembly is signed, then it probably isn’t. Since the example is of someone else’s assembly then it probably is signed.
  • Make sure that your installation of Visual Studio is compatible with WiX (I use Visual Studio 2015 Developer)
  • Download the latest stable release of the WiX toolset that works with your version of VS.
  • Put the assembly (dll file) in any folder on your computer and have the corresponding path at hand. It doesn’t matter where the file is as long as you can access it when building the project.
  • Create a new VS project for WiX as shown in this picture: create wix project in VS
  • Put an XML fragment into Project.wxs that points to your file:the important stuff is now in the xml file
  • Set the project to release mode, build it, and go to the bin file to take a look.
  • The three files are all needed for your installer, so zip them together and you are basically done: set build to release, open the bin folder and look for three files - add them to a zip and you got your installer
  • Move your zip file with the three files to the machine you need the DLL on and run the installer. If you want a prettier installer with more options and so on, put more effort into the WiX project!

Here is the excerpt of the xml that worked for me, mileage might vary so look up the documentation and the comment sections of other articles you find.

  <ComponentGroup Id="ProductComponents" Directory="INSTALLFOLDER">
<Component Id="Ionic.Zip">
  <File Name="Ionic.Zip.dll" KeyPath="yes" Source="C:\Import\Ionic.Zip.dll" Assembly=".net" AssemblyManifest="Ionic.Zip.dll"/>


That time I used ON UPDATE CASCADE in production


While working with a invoicing system back in 2013 I was asked by my customer if it would be possible to enable them to have some more special handling of their biggest client. The way their system was set up, one client accounted for over half of the accounts receivables. The client and my customer were also very close in their operations. What they needed from the system was the ability to split the client into four legal entities while remaining all the custom logic and prices. They asked me if a quick-fix was possible.

“A quick-fix solution is as fast, as easy and as well-documented as its explanation from the customer” – me just now.

The system had a procedure which would work out all the numbers for each client and for this particular one handle subsidiaries together on all invoices. With this reorganization coming, all units would be connected to one of four new client subsidiaries and each subsidiary would instead receive its own invoices for its units.


What the options for foreign key constraints do

You can add two clauses (ON UPDATE and ON DELETE) when creating a foreign key constraint. Both can have one of four options.

NO ACTION – this is standard, an error is raised and the execution halts if the new value would violate referential integrity.
CASCADE – data integrity is conserved by updating or deleting records in the related tables in the same transaction. In other words completely hazardous when dealing with deletes. Use with caution.
SET NULL – If the relation between records is broken, the integrity of the referencing record is upheld by setting the value of the foreign key column to NULL.
SET DEFAULT – Same as above except the value becomes the default for the column instead of NULL, there has to of course be a default value for the column for this to work.

From rough draft to the final solution

On the back of an envelope I wrote down these nine steps that would re-route the client logic and make sure everything would still sum up nicely:

  1. Begin a transaction and run the stored procedures to create the invoices as normal
  2. For each unit represented for the four new subsidiaries: create a new invoice header with a temporary and made up sequence starting from -1 and going backwards to avoid collisions.
  3. For each invoice line: map invoiceId by subsidiary (-1 through -4)
  4. Drop the now empty and unused invoice headers created in step 1.
  5. Drop the foreign key constraint between headers and lines
  6. Re-create this foreign key but set it with action ON UPDATE CASCADE
  7. Re-run the stored procedure that fixes invoiceIds for headers, this procedure is a sub-procedure when creating invoices. When the headers are updated, the lines inherit the information through our foreign key constraint.
  8. Drop and re-create the foreign key once again as it existed before the operation.
  9. Commit transaction. Done! No lines removed, all accounted for and all connected nicely.


Big changes are under way at SolidQ

I am excited to announce to our community and partners that I have moved on from my North American leadership role to a new role as our Corporate Chief Technology Officer. My new job is to align the long-term vision of what SolidQ does and how we do it, across all our international markets. Joining me will be Antonio Soto as Corporate COO, dedicating himself to our global operations and driving delivery support in both established and growing markets. (more…)