The Centipede Effect

Any introduction to the topic of data warehousing will first touch base on popular modelling concepts called “Star Schema” and “Snow Flake Schema“. If you would like to refresh these topics the respective links are here: “Star” and “Snowflake“. The data model to be used in a data warehouse project is one of the first decision made, which primarily depends on requirements, data availability, database, reporting, distribution, security etc. Choosing the right model has a direct impact, both in ETL and Reporting areas in terms of design and implementation.

Centipede3

In today’s context, the snowflake schema is predominantly adopted as the base data model and fine tuned as per the requirements. Efficiency plays an important role behind choosing the right model. This efficiency is typically viewed from reporting angle and common measurement is the speed of query execution on different data volumes. A complex data model adds more overhead during query execution and impacts overall performance. While fine tuning the data model, the centipede effect should be kept in mind because this will have adverse impact on performance.

Centipede4

A centipede effect comes into picture when the de-normalization of data model ends up in creating many dimension tables (e.g. 50 dimension tables) in the data cube. This distorts query performance especially when high volume and high dimensional analysis is performed. There is no definitive number on the the limits of dimensionality and experts have different opinions on the exact ratio of dimensions versus data volume. Getting the right number of dimensions may not be possible during the first iteration, but it is critical to finalize the design at the earliest because any design change affects the entire project due to high degree of dependencies across all areas.

In conclusion, being aware of this issue will help in designing better data models right from inception phase.

NOTE: In SAP BW, the number of dimensions are restricted to 13 user defined and 3 system defined dimensions. This is just an example to showcase of how to avoid the centipede problem.