Imagen para determinar la altura de la cabecera
Links
SLODUG
Our Blogs > English > Dejan Sarka
Inside Microsoft SQL Server 2008: T-SQL Querying RTM

The book Inside Microsoft SQL Server 2008: T-SQL Querying RTM'd a few days ago. :-) The main author is Itzik Ben-Gan, and coauthors involved in the book besides myself are Steve Kass and Lubor Kollar. Steve Kass was also the main Technical Editor of the
book, and Umachandar Jayachandran helped by editing a couple of chapters. César Galindo-Legaria wrote the foreword.

The book is being printed these days and should be on the shelves in a matter of a few weeks. Info on the book can be found in the following website: www.InsideTSQL.com.

Different SQL Dialects - Updated

I wrote about the previous version of Kevin Kline's SQL in a Nutshell. The next, the 3rd edition, is out. I still think it is a very useful book for people that have to maintain SQL code for different platforms. It covers SQL dialects for the following systems: ANSI SQL, Oracle 11g, SQL2008, PostgreSQL 8, and MySQL 5.1.

SQL Server 2005 SP3 & BOL Update

On MS download, you can get Service Pack 3 for SQL Server 2005. December 2008 refresh of SQL Server 2005 Books OnLine is available as well.

SQL Server 2008 Upgrade Technical Reference Guide

With a little delay, I am proud to announce SQL Server 2008 Upgrade Technical Reference Guide. This is not just a whitepaper; with 490 pages, it is a complete book. Yet, it is free for download. And the reason I am so proud? Well, I am one of the authors of this document, together with froends from Solid Quality Mentors. I wrote the Data Mining part.

Bleeding Edge Conference

I am very proud I can finally officially announce the Bleeding Edge conference. This is a conference I am preparing together with Matevž Gačnik and Dušan Zupančič. It is a conference prepared by developers for developers. This one-day highly technical event is going to take place on October 1st, 2008, in Portorož, Slovenia. Save the date, or even hurry up with registration!

Data Modeling and Creativity

No doubt data modeling is a process where you can add a lot of creativity. No doubt there is not a single solution for a business problem with relational model. However, does this mean that data modeling is more art than science? How much creativity is right?

I would say that database design is definitely not an art. It is based on science. Of course, you need some creativity. But is creativity limited to artists only? In any profession you need some creativity. However, would you buy a system that your company is going to depend on from an »artist«? A good model has predictable behavior. If the behavior is not predictable, it is not because db designing would be art; it is because the designer does not know her/his profession. I think people abuse word "art" just because they do not know about design enough.

I met so many times very »creative« models. There is a whole branch of modelers I call "inventors"; however, do please note that I mean this in a sarcastic way. It is very simple: if you do not know a lot, you are forced to invent. Interestingly, they mention "theory" a lot as well. Basically, they are "improving" the "theory". Typical examples would be people who invent "XML databases", "OO databases" and similar, and by the way they criticize relational model without really understanding it.

However, do we have to follow the established ways, without using brains? There is another kind of modelers I call "theorists". There are modelers who never use their mind and always follow the authorities in the area, explaining they are following the "theory". Note that they typically mix the real theory, which has foundations in science, mathematics, with words of well-known authorities. For example, types 1, 2, and 3 for handling the SCD problem have nothing to do with any theory; they are just proposals by Ralph Kimball, and since Kimball is an authority, this kind of modelers does not dare to use own brains to try to find a better solution.

So what is the right mix of established ways vs. creativity? First of all, you should always use your brains and common sense. A real intellectual doubts in everything. Do please doubt in words of authorities, scientists, politicians… Try to find better solutions. However, use extreme caution before you claim you invented something. It was probably already invented for thousand times; it is probably just you who does not know anything about it. Do not forget: if you don't know, you are doomed on inventing. Or on calling you an "artist".

Data Vault Modeling Critics

Our mentor Rushabh B. Mehta has pointed me to an interesting Web site http://www.danlinstedt.com/, where you can read about a modeling technique the author named Data Vault Modeling (DVM).

I have not heard of this technique before. I read all of the five articled on the Data Vault technique, starting with http://www.tdan.com/view-articles/5054/ (you can find further links in the first article). I read slowly (not really thoroughly) articles 1, 2, 3, and 4, and briefly 5 (I am not interested in this comment in the ETL process, which is described in article 5). Ok, let me start with my comments.

I did not like the style of the author. There are many quite strong assertions without a proof. For example, in the basic article about the technique, the author claims that one of the benefits of the DVM is "in-database Data Mining". I could not find any proof for this statement in further reading. I deal with Data Mining quite a lot. I could not find how DVM could help me with Data Mining; on the contrary, I found the DVM model very unsuitable for Data Mining. In article "Data Vault Series 1 - Data Vault Overview" the author claims "The Data Vault is based on mathematical principles". I could not find those principles in further reading. Probably they are somewhere; at least I would expect a link.

The basics of the model can be described with three types of tables: Hubs, Links and Satellites. Hubs are the entities, with PKs, business keys and lineage info only. Links are resolution of n-ary relationships between Hubs. Satellites add additional attributes to Hubs and Links. What does this mean in terms of standard Dim and Fact tables? Dim tables are Hubs with one or more Satellites, Fact tables are Links with one or more Satellites. Without much ado, I would say that the only big difference to the dimensional model here is the separation of attributes from entities into separate tables. I do not think this is something new; I have seen many models like this in the past, and the authors did not claim this is something new that would need a patent. Yes, the author wanted to get a patent, and fortunately the patent has been denied as of January 2005. Anyway, to find all info you have in a Fact table, for example, you have to join the Link table with its Satellites. Simplification?

The DVM is focused on one problem mainly. The problem DVM is trying to solve is changes in the dimensional model, namely adding a new dimension and thus changing the grain of some fact tables. The solution is not very impressive. What happens if you add a Hub, i.e. a dimension? The author simply outdates an existing Link table and adds a new one, with additional FK, and then claims it is much easier solution that modifying an existing Fact table in a DW. Well... what happens to the history? Can you still do the analysis over time, comparing the data from two Link tables? I did not find an answer to that question. And, of course, nowadays it is not so painful modifying the grain of the Fact table anymore. You can simply add additional column which allows NULL values (meta-data operation only, so it works in TB databases), and then solve the problem in SSAS with handling the Unknown member. And you have a simple model that gives you all of the history. Finally, you can also model your Star schema the same way, by adding a new Fact table when the level of granularity changes. Therefore, I am not really impressed with this one and only solution the author really provides.

The author solves all of the problems with dimensions with Hub and their Satellite tables, again on a very basic way. For SCDs, simply split the attributes in associated Satellite tables based on their frequency of changes. Well... what is so new and revolutionary with this idea? AFAIK similar solution suggested Ralph Kimball for, how he called at that time, Monster dimensions in late eighties. For merging from multiple sources, the only suggested solution is to insert in Hubs all of the source members. And that's it. So how do you get a complete story for a single customer? The author admits it would be useful to have a single row for a single customer, however, he also says this would be "out of the scope" of those articles. Oh well, it is so easy to deal with real problems in this way. Another nice simplification, directly copied from one of the texts, one of the basic rules of the "Data Vault" modeling: "Hub business keys never change." So what if the business key changes? Well... I could not find any answer, it is simply not allowed by the author. Sancta simplicitas!

The author claims the "Data Vault" model is better than 3rd NF in many ways. I could not find a proof for this claim in his text. My conclusion is the author thinks it is better because of the Link tables, which are better than intermediate tables in resolving many-to-many relationships. Well... the author simply does not understand that the Relational model is not limited to binary relationships only. You can safely model relationships of any arity, without breaking not only 3rd, but also 5th NF, as long as you understand that you need to have uniqueness on at least n-1 foreign keys in the n-ary relationship. BTW, the author claims his model uses some mathematics, but I could not find any mathematics in his writings; however, Dr. Terry Halpin really mathematically proved that n-ary relationships are fully normalized if the uniqueness is on at least on n-1 keys.

The author talks all of the time bout surrogate, natural, business, smart keys. The author does not realize we have only keys, without any attribute. And, of course, so-called "smart" keys are usually the most stupid keys you can select, giving you enormous troubles with changes. Hmm... Wasn't the main purpose of the "Data Vault" model to simplify dealing with changes?

The author spends a lot of time on the join problem. Of course, joins are slow in existing RDBMS products. The author admits that this problem is even worse in his model, as you need additional joins even for Dim and Fact tables. And what are the suggested solutions? I am mentioning just one of the proposals. The author proposes adding a Star schema above the Data Vault schema. Sic! So finally you have OLTP schema, "Data Vault" schema, and the Dimensional schema. So why do you need the intermediate "Data vault" schema? Where is the simplification?

Ok, I am going to stop here. I hope I did not overlook something important in DVM. From what I read, I am far less than impressed.

Data Warehousing in SQL Server 2008 Presentation

 

I am also uploading the What's New in SQL Server 2008 Data Warehousing presentation from the 13th NT Conference. Again, the demos are zipped and the zip file is renamed in order to be able to upload it.

Mastering Data Quality with SQL Server 2008 Presentation

I promised to upload the Data Quality presentation and demos on both conferences, 13th NTK in Slovenia and TechEd Developers in Orlando, FL. You can find the presentation here and demos here. Note that demos are zipped and the zip file extension is changed to .txt in order to e able to upload it to the SPPS site.

13th NT Conference

As usual, I am posting a short comment on our most important IT event of a year.

With more than 2,000 attendees, this was again an incredible event for such a small country. I have to congratulate local MS subsidiary again; IMO they are the best event organizers in the world.

The conference itself has changed substantially. It is not strictly technical anymore. About 30% of presentations were aimed on soft skills. It seems that this fact did not please technical attendees; however, the soft skills presentations were evaluated very well, and therefore obviously exists the other audience. Apparently MS Slovenia noticed the trends and reacted accordingly. Nevertheless, this fact also opens space for another, smaller and more technical conference in Slovenia.

I am not talking about technical conference without a reason. Together with MS Slovenia, Dušan Zupančič and Matevž Gačnik we are preparing another high-end developer's event called Bleeding Edge on October 1st. Developers, database developers and advanced DBAs, stay tuned and save the date!

1 - 10 Next

 ‭(Hidden)‬ Admin Links


¿Problemas técnicos? Contáctenos en webmaster@solidq.com o al 800 300 800 (desde España) o al +34 91 414 8950 (desde fuera de España)