Time Travel – Slowly Changing Data

youtube.com

Time Travel, a term that every BI professional should be aware of because of its intertwined nature with data management and reporting requirements. It also reflects the nature of events that happen in real world and the constant flux of changes to business processes. A Data Warehouse should have a mechanism to capture changes to an entity in order to accommodate Data Time Travel during reporting. Let us look at this concept through an example.

A Sales Employee of an organization started as a Junior Executive and was in-charge of territory A. Some time later, he/she was transferred to a different territory and given additional responsibilities. Couple of years later, he/she was promoted and moved to Marketing department as a Manager. During the course of all these changes from career perspective, the employee had married and has started a family.

The scenario above becomes a nightmare from reporting perspective if history is not captured where data will always show the latest snapshot. Such scenarios can be extended to almost all master data entities and thus affecting how we model Dimensional data.

An analogy can be drawn for fact data too, wherein even fact data changes frequently in many instances. In reporting scenario, fact data changes are not relevant and hence changes are not typically updated in Fact data. In reality, changes to transactions are very important at the transaction system i.e. OLTP system and not implemented in an OLAP system. All OLTP systems will have feature to Audit, which captures changes to transactions and provides option to trace back changes made since the first record was posted.

In an Enterprise Data Warehouse architecture, Time Travel is implemented in the following manner

Dimension Data: There are tried and tested methods, which are popularly referred to as Slowly Changing Dimensions (SCD) concept, where all changes to each master record are captured. There are different types of SCD methods and based on requirement one can adopt the right one to choose. In majority of implementations, SCD Type 2, 3 or 4 is adopted and in some instances a hybrid approach with combination of any three type are implemented.

Slowly Changing Dimensions

Fact Data: Changes to Fact data can be captured if it is an absolute necessity. The logic is typically implemented after Staging layer, within Operational Data Store layer. Approach adopted for fact data is similar to SCD, but it is handled in a different manner. Soft delete is a concept that can be used when data volume is low wherein all changes are stored in ODS layer with only one record marked as valid by flag. Another approach uses the delta calculation mechanism i.e. CDC mechanism to generate the changes.

Changing Facts