From a conversation with other MVPs, I came to a conclusion that most of people are confused with term “lookup table”. Many different opinions arose, like whether to have a single or multiple lookup tables, and whether this is a dimension, if RDBMS uses scan instead of seek to retrieve data from this table, and similar. Therefore, it seems it is time to explain what exactly a lookup table is.
First of all, this is a logical term; whether there are seeks or scans does not influence on logical terms like “lookup table” and “dimension”. Modeling should always start from logical point.
The confusion stems from the general modeling question: what is an entity, and what is an attribute. An attribute in one model can become an entity in another model, and vice-versa. It can even evolve from an attribute to an entity. How comes? Well, Cantor’s two axioms of the Set theory say that we know what a set is and we know what an element is; said differently, it is up to the modeler to define sets. In the Relational model, we have entity sets (from relational algebra perspective; we have sets of propositions from relational calculus perspective; everything should work from both perspectives, but currently I feel it is easier for me to explain lookup tables from algebra perspective). Now which entity sets do we select for the model? Of course, we select those of business interest. This is called abstraction, defined already by Peter Chen. The same is true for the attributes: take into model only those that have some reason to exist in the system.
Because systems evolve over time, you might get an interest to collect some “attributes for an attribute”. This is the sign that the attribute has to be promoted to an entity. For example, a country could be an attribute of a customer in a CRM app. However, when you want to start tracking sales over countries, and get interest for additional attributes of countries, it might be promoted to an entity. Note that Countries might become an entity set of business interest even without additional attributes (just id and name); maybe all you need is just a sales report over countries.
Ok, so what exactly is then a “lookup table”? It is simply a constraint. It is a domain constraint on an attribute. It is simpler to use “lookup tables” instead of check constraints if the set of permissible values is volatile, or is too big. With a lookup table, we can maintain a set of up to countably infinite power.
A table is our representation of a relation. From table shape, you cannot say for sure whether something is a “lookup table” or an entity. Generally entities really have more than just id an name attributes; however, the thing that really distinguishes “lookup tables” from entities is their reason of being in the system and their usage. Note also that I always use apostrophes when referring to “lookup tables”. “Lookup table” is not really correct logical term; correct logical term is domain constraint. A constraint is a constraint, no matter how it is physically implemented.
Now, do we want to have a single “lookup table” of multiple ones? More practical is to have multiple ones, because you never know, when your attribute is going to be promoted to an entity.
- Python for SQL Server Specialists Part 4: Python and SQL Server - April 24, 2018
- Python for SQL Server Specialists Part 3: Graphs and Machine Learning - April 11, 2018
- Python for SQL Server Specialists Part 2: Working with Data - March 22, 2018