Changing Data: Dimensions

www.redorbit.com

Master Data i.e. Dimension data has huge impact in providing Single Source of Truth, an universally accepted benefit of Enterprise Data Warehouse and Business Intelligence systems. Dimensional data changes are popularly referred to as Slowly Changing Dimensions. This term was coined due to frequency of changes to master data being sporadic in nature.

In a simple data model, where changes are ignored, only the latest snapshot of master data values are reported. In order to enable “Time Travel“, additional logic in all layers i.e. Physical Data Model, ETL and Reporting has to be built as part of the data flow. These changes will entail additional processing time and resources and requirements should be carefully analyzed before enabling this feature. For example, consider Contact Person & Location attributes of Customer dimension. Here, Location attribute can be flagged as an important whereas Contact Person could have only the latest  snapshots.

Slowly changing dimensions are typically classified into three types based on implementation. There are many variations to the three basic definitions and even hybrid definitions derived from basic three types also exist.

  1. No history preservation. Only the latest value is retained, which defeats the concept of preserving history. This is used for several dimensions that don’t require preserving history.
  2. Add completely new record with an active flag which indicates the current and all historical records. A slight variation includes version number too in order to avoid primary key conflict.
  3. Add a new record for new data and include validity period for old and new records.

Slowly_Changing_Dimension_1.jpg

The most common and flexible type implemented in majority of BI projects today is SCD Type-3 with slight variations based on requirements. ETL tools have the primary role in identifying changes to master data and implement the SCD logic. ETL tools in the market today have wizards and will generate entire workflow to implement SCD. Options are available to trigger the changes to be captured only for specific set of attributes instead of all attributes of dimension master. If customer dimension has 40 attributes, ETL workflows can be designed to trigger SCD only on 10 important attributes instead of all 40 attributes.

From a reporting perspective, the user is prompted with a specific date and the user specified value is compared with validity period in order to fetch the right master data. All SQL queries execute a JOIN BETWEEN operation (theta join) against the dates to fetch the right master record. In case the user input is NULL, the most recent snapshot of the data is displayed in the report.

In SAP BW & SAP HANA, Slowly Changing Dimensions of Type-3 are out of the box functionality. In SAP BW, a change to any attribute of an Info Object triggers creation of new record and validity ranges are determined based on data load date & time. SAP HANA uses a special type of table called "History Table" and any Insert & Update DML operation into this table is captured automatically. Similar to SAP BW, any field changed is captured as new record and validity is automatically determined.

Some Good resources about SCD

  1. LearnDataModeling.com
  2. DWBI.org
  3. Microsoft Blogs

2 Trackbacks / Pingbacks

  1. Changing Data: Facts | "Data" to "Wisdom" Journey…..
  2. Time Travel – Slowly Changing Data | "Data" to "Wisdom" Journey…..

Comments are closed.