A common problem in BI initiatives that typically mushrooms after systems have been running for quite some time is ‘lack of query speed’. Performance optimization is an ongoing effort in all live systems and there are several factors that affect query run time.

Impact of Dimension Table Size

Dimension tables holding master data slowly grow in size (i.e., in terms of the number of records) and large dimension tables slowly degrade the execution of queries. During the implementation phase, data models are designed to accommodate future data growth, but in several cases, data growth does not follow the predicted pattern.

Degenerative Dimensions

Any dimension table that grows in volume relative to fact table size is referred to as a degenerative dimension. The typical ratio used to make a decision whether a dimension table is degenerative or not is 1:5 (this is just indicative, but not the final verdict). Since degenerative dimension tables have a large volume of data, the database has to join two large tables, which results in a reduction in performance.

Procurement-fact

An example of such a scenario is a cube built to analyze procurement data. All dimensions in this model are low in granularity whereas the Purchase Order dimension will have a large volume of data due to the varying nature of transactions that are generated on a day-to-day basis. During the go-live phase, data volumes may not be high enough to notice the problem but will definitely creep in as the data gets updated regularly from transnational systems.

Procurement-fact-PO

 

The most common solution to this kind of problem is to alter the data model slightly, which will eliminate the need for the database to perform the join operation itself. Dimension i.e., degenerative that causes performance bottleneck is merged with the fact table. While performing this merge, care is taken not to dump all attributes into the fact table and only dimensional attributes that have high granularity are taken into account. Other attributes that have low granularity are retained within a separate dimension. In the example above, the Purchase Order number is unique for every transaction and hence merged with the fact table. Other attributes such as PO Type, Delivery Method, etc. are retained within the old dimension table.

The standard modeling practice does not have non-measures in the fact table, but special scenarios like this are exceptions rather than the norm.