Changing Data: Facts

wikipedia.org

Quick look at Slowly Changing Dimensions

Time travel for fact data is not an often implemented feature in Enterprise Data Warehouses. A transaction i.e. fact is typically overwritten with only the latest snapshot no matter how many times it was changed in the source OLTP system. In the real world too, the final value of a transaction is of interest from reporting and analytics perspective.

Consider a scenario where a customer changes a sales order a dozen times by changing the products and quantity ordered. Looking at the purpose of EDW/BI systems, which is to provide “single source of truth“, only the final snapshot value has to be stored in the data warehouse or mart. All other changes are just overheads to be processed and stored, which does not provide any value to the final user. Fact data is updated (i.e. SQL UPDATE) in order to overwrite the values in ETL workflows.

Special use cases may require even fact data to be captured similar to slowly changing dimensions concept. Consider the scenario discussed previously, where a customer is changing products and quantity multiple times in an e-commerce website such as amazon.com. The marketing department is running a promotion and would like to analyze customer behavior with respect to promotions. In such cases, every change to any field has to be captured in the data warehouse.

There is no industry standard on implementing such “Changing Fact” scenarios and the the most common implementation data model uses a revision or version number to track changes. In majority of cases, data is captured in an independent data mart using a different ETL job without any impact to main Enterprise Data Warehouse. In this scenario, data is inserted (i.e. SQL INSERT) in all cases.

At times, the transactional system, e-commerce portal in this case can handle storing changes and in some cases source systems do not capture changes. ETL jobs must be configured to derive the history and update accordingly.

SAP BW has an option out of the box to handle changing fact in Data Store Objects (DSO). Each revision to a record invalidates the previous version and marks it as old and inserts a new record based on the keys defined