I am dealing with IT solutions for more than 20 years. I have seen many different solutions, and many different customer databases. I am focusing on BI solutions, especially Data Mining and OLAP. I am experiencing the same problem over and over again: garbage in, garbage out. In a BI project, you typically spend about 70% or even more time with ETL process, especially with over viewing, inspecting, cleansing, and merging data. Most of my customers did not even know how much garbage they had in their data.
However, the data is the key asset of a company. The asset is not in buildings or machines. People are valuable asset; nevertheless, people change companies, jobs, careers, or even die. Data is the asset that remains in a company. In data, the complete knowledge of a company is hidden. Therefore, data quality is a crucial issue.
I plan to write articles on data quality. I will describe data quality dimensions and models. I will show methods you can use to measure data quality, to inspect data, to measure the amount of information hidden in the data and more, with code supported by SQL Server 2008. Of course, I hope this is not just a promise; I will try to find time to accomplish what I started with this post.
- Python for SQL Server Specialists Part 4: Python and SQL Server - April 24, 2018
- Python for SQL Server Specialists Part 3: Graphs and Machine Learning - April 11, 2018
- Python for SQL Server Specialists Part 2: Working with Data - March 22, 2018