There are only two approaches when it comes to creating a multi dimensional model, namely Star and Snowflake. All other models are variations of these two base versions or a hybrid of both in some form. Snowflake schema has seen more adoption compared to Star schema in many Data Warehousing Environments (DWE).
In Start schema, the dimension tables hold “ALL” attributes related to the subject area. There is only one level of linkage i.e. join between a dimension and fact table. Consider an example of Customer Dimension, which has several attributes such as Name, Type (Business or Personal), Group (Regular, Seasonal, Offer Based, High Volume), Contact, Location and other demographic information. This can be visualized as a large table where each and every attribute is a column with values populated accordingly. Here the data is heavily de-normalized irrespective of the carnality of an attribute. The sample table to left shows one such example with a wide variety of attributes from Oracle based Data Warehouse. This is a simple design from data modelling point of view and from ETL perspective, there is only one target to be updated and maintained. Performance wise Start Schema scores better because of elimination of extra joins to be created during run-time.
Snowflake schema on the other hand groups the attributes based on some criteria and spreads out the dimensional data into multiple tables. In this model, a central dimension table stores core attributes and the rest are maintained independently. Some good examples as mentioned in the sample above are Industry, Customer Type, Customer Contact etc. There could be many reasons for adopting this approach such as ease of modelling, data carnality, enable conformity or technical integration challenges. Data is de-normalized only partially i.e. up to the detailed level of central dimension table and at run time if analysis is performed on attributes that are outside central table, the performance drops due to additional joins that are triggered. Modelling is not very complex, but ETL is slightly complicated due to the number of data flows and target tables that needs to be updated and maintained.
In conclusion, the approach depends on what is the priority i.e. performance or data agility. Star schema based data marts will score in performance centric arena whereas snow flake schema will be apt for a more generic approach.
Several advancements in hardware and software technologies are disrupting several facets of Business Intelligence domain. Articles related to this disruptive phenomenon are below.
- Blurring Lines between OLTP & OLAP – Part 1
- Blurring Lines between OLTP & OLAP – Part 2
- Blurring Lines between OLTP & OLAP – Part 3
NOTE: SAP BW implements its own version of Star Schema called Extended Star Schema (ESS). The base data model is actually a Snow Flake based model, which is complicated and normalized (thanks to German engineering). SAP BW has come to senses after releasing its in-memory platform SAP HANA, where the designer has an option to reduce the overheads of ESS and adopt a Snow Flake model (which in my opinion is still a normalized design)