Degenerative Dimensions

A common problem in BI initiatives that typically mushrooms after systems have been running for quiet some time is “lack of query speed“. Performance optimization is an ongoing effort in all live systems and there are several factors that affect query run time. Dimension tables holding master data slowly grow in size (i.e. in terms of number of records) and large dimension tables slowly degrade the execution of queries. During implementation phase, data models are designed to accommodate future data growth, but in several cases data growth does not follow the predicted pattern.

Any dimension table that grows in volume relative to fact table size is referred to as degenerative dimension. Typical ratio used to make a decision whether a dimension table is degenerative or not is 1:5 (this is just indicative, but not the final verdict). Since degenerative dimension tables have large volume of data, the database has to join two large tables, which results in reduction of performance.

Example of such a scenario is a cube built to analyze procurement data. All dimensions in thisDegenerative_Dim_1 model are low in granularity whereas the Purchase Order dimension will have large volume of data due to the vary nature of transactions that are generated on day to day basis. During go-live phase, data volumes may not be high enough to notice the problem, but will definitely creep in as the data gets updated regularly from transnational systems.

Most common solution to this kind of problem is to alter the data model slightly, which will eliminate the need for database to perform join operation itself. Degenerative_Dim_2Dimension i.e. degenerative that causes performance bottleneck is merged with the fact table. While performing this merge, care is taken not to dump all attributes into fact table and only dimensional attributes that have high granularity are are taken into account. Other attributes that have low granularity are retained within a separate dimension. In the example above, Purchase Order number is unique for every transaction and hence merged with the fact table. Other attributes such as PO Type, Delivery Method etc are retained within the old dimension table.

The standard modeling practice does not have non-measures in the fact table, but special scenarios like this are exceptions rather than norm.