Gradually, as storage gets faster and local SSD storage becomes more popular, etc. disk access times are significantly decreasing. In these regards, perhaps the best example are the SSDs Optane systems, notable for their much lower read/ write latencies than with traditional SSD’s, in addition to being directly connected through the PCIe bus: (more…)
Although SQL Server Integration Services, hereinafter SSIS, is capable of uploading Excel files, in most cases it can be time consuming because any small modifications to the Excel files can make the SSIS crash. For that reason, the best option is usually to transform those Excel files into .csv format, since uploading text files will cause you significantly less issues than the Excel files.
You can quickly save any Excel file as csv manually by saving as .csv from within Excel. However, it becomes an issue when you have to do the same for a lot of Excel files, or in cases were you need the change to be done automatically.
In this post, we will explain how to do this format change automatically using PowerShell and how to loop through files in the same directory in order to upload several Excel files together using SSIS as the main tool for the whole process.
In an on-premises environment when we propose solutions to geographical disasters, the most common option is log shipping. The use of asynchronous database mirroring or availability groups with asynchronous replicas is also common but includes an additional risk that is not usually contemplated. We refer to the “speed” with which the changes are transferred, as quickly as the network and the target system allow us. This means that when the disaster has a human origin, an important error when we become aware of it, we will have this error replicated and applied. Obviously, a better solution would be to combine both options, which are not exclusive, with which we would cover more disaster scenarios increasing the cost of the solution. (more…)
In the last article in this series about Python for SQL Server specialists, you are going to learn how to use SQL Server Python libraries in SQL Server. You can use two scalable libraries, the revoscalepy and microsoftml libraries, which correspond to equivalent R libraries.
SQL Server and ML Integration
With SQL Server 2016 and 2017, you get a highly scalable Machine Learning (ML) engine. Not every function and algorithm is rewritten as a scalable one. Nevertheless, you will probably find the one you need for your analysis of a big dataset. You can store a Python or R data mining or machine learning model in a SQL Server table and use it for predictions on new data. You can even store graphs in a binary column and use it in SQL Server Reporting Services (SSRS) reports. Finally, ML support is not limited to SQL Server only. You can use R code also in Power BI Desktop and Power BI Service, where we at this time (April 2018) still waiting for Python support. However, you can already use both languages, Python and R, in Azure Machine Learning (Azure ML) experiments. (more…)
In this post, we will code a script in python (with Visual Studio 2017) to create a program which we can execute as a windows service in order to extract (in almost real time) the tweets related to certain words or hashtags, store them in a SQL server database, and then consume them with Power BI. (more…)
After learning about Python fundamentals and basics about working with data, it is time to start with more exciting parts of this Python for SQL Server Specialists series.
In this article you will learn about the most important libraries for advanced graphing, namely matplotlib and seaborn, and about the most popular data science library, the scikit-learn library. (more…)
In my previous article, you learned Python fundamentals. I also introduced the basic data structures. You can imagine you need more advanced data structures for analyzing SQL Server data, which comes in tabular format. In Python, there is also the data frame object, like in R. It is defined in the pandas library. You communicate with SQL Server through the pandas data frames. But before getting there, you need first to learn about arrays and other objects from the numpy library.
In this article, you will learn about the objects from the two of the most important Python libraries, namely, as mentioned, numpy and pandas. (more…)
Python is one of the most popular programming languages. It is a general purpose high level language. It was created by Guido van Rossum, publicly released in 1991. SQL Server 2016 started to support R, and SQL Server 2017 adds support for Python. Now you can select your preferred language for the data science and even other tasks. R has even more statistical, data mining and machine learning libraries, because it is more widely used in the data science community; however, Python has broader purpose than just data science, and is more readable and might thus be simpler to learn. This is the first of the four articles that introduce Python to SQL Server developers and business intelligence (BI) specialists. This means that the articles are more focused on Python basics and data science, and less on general programming with Python.
In this post, we will talk about the new property that appears in the SQL 2016 version: Auto Adjust Buffer Size. This attribute is specific to the “DataFlow” component and can take the values of ‘True’ or ‘False’ (default). Furthermore, we will propose an approximate solution for previous versions.