How to model a Semi-Additive fact? It is a common problem faced during the Data Warehouse design phase. A prime example to discuss this scenario is the Finance data model that stores General Ledger fact data. GL data is additive in nature across all dimensions, but only with reference to time context i.e., dimension. The primary reason for this behavior is the definition of ‘starting point’, which will be the date on which the opening balances were updated, and normal transactions were henceforth entered into the system. Although opening balance is a fact, it has different contextual interpretations with respect to normal transactions. For any reporting requirement, this ‘starting point’ must be considered into account for all subsequent calculations.
In reporting layer, it is a two-step process to correctly derive a semi-additive fact. Firstly, the logic for calculating a beginning value based on time context is achieved by simply aggregating all values until the time reference is provided. Subsequently, report logic references the starting value arrived in the previous step to derive the remaining measures in the report. Calculations such as starting value, measures, etc., can be delegated to the database, but the two-step process is still required albeit in the database layer.
Performance
From a run-time performance point of view, a full table scan is always required for reporting semi-additive measures. Due to this caveat, multiple aggregate tables are built at different levels of lower granularity based on analysis patterns. For example, the grain of fact table will be at each transaction level and aggregates can be built with lower grain at day and month levels.
Data Model
Modeling the fact table can be done by either keeping a single fact table wherein starting balances are embedded as part of the loading process. Alternatively, two fact tables with the same structure where one is loaded with opening balances, and another is updated with transactions. These two tables can be aligned through union in a view or materialized view for analysis. The second option only enables logical manage the data.
SAP BW is tightly coupled with the SAP ERP system. It has a Data Model provision to embed the aggregation type within a Key Figure and identify the time context with granularity in the Cube/ADSO. The data loading process also is split into opening balances versus transaction data. During reporting, the actual calculation is performed by the OLAP engine i.e., not database delegated.
SAP HANA provides pre-defined views that follow the logic described in this article, which is again calculated dynamically by Calc Engine and not the database delegated.