Fifteen years ago, the principle of development was quite easy. There was only couple of applications you could buy. Most of the time, if you needed something, you had to develop it by yourself. Nowadays everything has changed. Can you count how many server products only Microsoft offers? Just with SQL Server you get Database Engine, Analysis Services and Reporting Services (you could count in Integration Services as well, but as SSIS services does not execute the packages and thus do not really do something, I do not count it among other “working” services).
Nowadays the real problem many times is not what to write, but rather what to select. Of course, many applications still need to be written. However, imagine you need to add Business Intelligence to your system. What do you need? Classical reports, OLAP analyses, Data Mining, or all of the technologies mentioned. When do you stop developing new reports and move to data warehouse / OLAP solution? When do you move from OLAP and MDX queries to Data Mining?
This is a very huge topic. You can also ask yourself what application architecture do you need – client-server, multi-tiered, SOAP? What tool do you use for data consolidation – replication, SSIS, SQL queries, backup / restore, database mirroring with snapshots, log shipping? All together it is simply too much for a simple log, and I am not good enough in all areas mentioned to give competent opinions. Therefore, I will limit myself on couple of advices about reporting, OLAP and data mining.
Usually applications start to cover BI needs with reporting. Every developer knows when to create a report – when a customer needs one, for example. This is simple. However, when do you introduce a data warehouse? Well, check how many reports the customer uses, how many new reports a developer has to create in a specific period and how many new reports very similar to existing ones the customer demands. If the number of existing reports goes close to one hundred and above and there are about ten new requests for quite similar reports per month, then it is definitely time to start working on data warehouse. The idea behind DW / OLAP solutions is to enable end users to make their simple views and reports by themselves. Now next question is – do you need OLAP cubes besides a data warehouse? I would say yes, of course. Building a good data warehouse takes a lot of effort and time. When it is in the company, building couple of OLAP cubes on the top of a DW is like taking a cookie from a baby. Just do it! Now with AS 2005 UDM, which support multiple fact tables per cube, you naturally reach another question: do I create a single hypercube, which will represent a complete DW, or multiple smaller ones? My answer is going for multiple smaller cubes, although in many presentations and articles you can still read a cube should be as huge as the DW is. Smaller cubes are easier to maintain. If you just remember security – perspectives are not securables, therefore in a big cube you have a lot of working with setting the security on dimensions and maybe even on cell level. One cube should cover one are subject, like sales, warehouse, finance, production, which typically needs its own security and processing settings. Of course, it makes sense to have multiple fact tables (measure groups in UDM terminology) in the cube – it is nice to combine actual with planned data in the same cube.
Do OLAP cubes replace all of the reports? The answer is no. OLAP is very good for analyses, which are simple from mathematical point of view, but deal with large amount of data. Because of pre-calculated aggregates and simplified schema, the analyses are extremely fast. Basic four mathematical operations are simply performed graphically. Both parts together enable end users doing the analyses on-line, and I mean really on line, in real time. Complex analyses, performed rarely, like once per month, should still use classical reports. In addition, you cannot solve all problems with basic mathematical operations. Therefore, you need MDX queries as well. However, you should create MDX deliberately. If you create too many of them, you are introducing the same problems as there are with too many reports again. Even worse, this time the problems are on a higher level, because you have to maintain the DW and OLAP cubes and because MDX queries are somehow harder to write.
You should also take care not to try to solve problems that you can solve easily with data mining by using OLAP solutions. Take for example distinct count aggregates. They are always slow, because they cannot be calculated in advance. Nevertheless, what are they used for? Many times, they are used for market basket analysis. So why you do not use data mining Association Rules instead. Also, instead of manually breaking down the population with tens of different breakdown variables in order to explain values of a target variable using an OLAP client tool, why you do not perform all possible breakdowns automatically, with Decision Trees algorithm?
Ok, I am stopping here. As I mentioned, if you really want to consider all possible tools, services and architecture for an application, you are facing quite a big problem. I hope this blog will help you find out the right BI solution.
- Python for SQL Server Specialists Part 4: Python and SQL Server - April 24, 2018
- Python for SQL Server Specialists Part 3: Graphs and Machine Learning - April 11, 2018
- Python for SQL Server Specialists Part 2: Working with Data - March 22, 2018