Level Based Hierarchy – Link Table

Link tables organize data in a better fashion, which in turn provides additional flexibility while reporting. Hierarchy data is separated from the main dimension table into its own table that holds only relationships between attributes. Reporting tools will utilize the link table whenever the data has to be presented in hierarchical fashion. Example below shows two different hierarchies that are built as per different user group needs, which is based off the same dimension data.

Hierarchy_Level_Multiple_1

This method provides an option to store multiple hierarchies. One can define a unique name as the key for each hierarchy variant, which will be available for the user to select during report execution time. Link table should have the unique name of hierarchy variant along with the Hierarchy_Level_Multiple_2Dimension table key (Product Code in this example) as a composite key.

Dimension data is not forced to conform with any relevant hierarchies (e.g. Product 4 is not assigned in Hierarchy 1). The link table is independent in terms of data loading logic i.e. ETL as well as data consistency and quality.  In scenarios where we encounter a hierarchy variant that does not have all dimension entries assigned, a outer join, either Left of Right based on how the model is defined between Dimension and Link tables will bring values that are not assigned to any hierarchy variant to reporting layer.

This approach allows multiple hierarchy variants of different depths. Link table can be defined with an arbitrary number of columns to accommodate large number of levels i.e. depth. Only constraint faced during reporting is a method to exclude columns that do not have data. Majority of reporting tools in the market today automatically detect null values are ignore them during run time.

Personal Note: It is very rare to encounter hierarchies of depth exceeding two digits. Majority of hierarchies in any industry or functional area are typically in single digits, especially in BI space. As always there are exceptions to this observation.

Level based hierarchies give a quick turn around time during implementation. Defining and maintaining hierarchies, typically is under purview of business teams. Link tables are easy to visualize by the user and data can be provided in a simple spreadsheet for loading. Any re-organization activity can be immediately captured and populated into the hierarchy tables.

Time travel can be implemented in multiple ways, either for a full hierarchy, or just the nodes. Basically, the validity periods are captured along with respective Hierarchy_Level_Multiple_3data points based on the approach adopted. During run time, the correct hierarchy is fetched based on user input (i.e. date) and data is rolled up as per the correct hierarchy.

ETL data flows are not complicated and overheads of maintaining a single large dimension table is reduced. Typically two independent data flows manage and update respective tables. If time travel is adopted, then both workflows have to manage Slowly Changing Dimension concept for respective targets.  One critical checkpoint to be aware of is the conflict of time ranges between Dimension data and Hierarchy data. Typically, the SCD values defined for Master data takes precedence and Hierarchy Link tables are updated accordingly.

Although this approach addresses several real time requirements, it is not a robust solution. Parent child hierarchies are slightly complicated to create and maintain, but addresses majority of issues encountered in Level Based hierarchy concept.