In Extract, Transform & Load (ETL) processes, there are two strategies adopted for loading data into the target data warehouse. Whenever an ETL job/workflow is executed, based on the methodology adopted, the following happens.
- Truncate & Load: The target table is deleted completely first, and all data records from source system is inserted afresh.
- Delta Load: Here, the source data records are first checked for changes i.e. New records inserted and Old records updated. Only records that have valid changes are updated in the target.
Truncate & Load is adopted predominantly in tables with dimensional data and to a minor extent summarized fact mini data marts. Features of this approach is as follows
- Simple ETL workflow: A truncate and load workflow is very simple in terms of technical complexity. Maintenance is easier due to this design and the entire workflow consists of just three steps
- Read from source into staging table
- Delete Target
- Insert from Staging into Target
- Record Size: Applicable only when a small number of records are in scope. Since the data is physically deleted and inserted again, the reports accessing this data may face issues until the actual data is committed. Large data sets will pose an issue in data availability front.
- Target Structure: Good approach to be adopted when target table structure i.e. columns change often. ETL workflows
- Data Volume Growth: If source data volume grows over a period of time, performance problems might arise. This is related to second point mentioned above.
- Data Availability: One of the key items to be considered while adopting this approach. Since ETL jobs are typically executed during off peak hours, this will not pose a major risk. At times, a temporary table approach is taken wherein the data records are first updated in the temporary target and the temporary table is then swapped with live target table.
- Scenario Based: Clear evaluation of requirements to be performed for applicability of this method for each scenario at hand. One must also consider future state of the data warehouse during scenario analysis.
If all ETL workflows are taken stock in a DW/BI environment, the number of Truncate & Load type of ETL jobs are minority when compared to Delta loads. Although the number of workflows are small in number, these workflows come with their own merits and can be seen in almost ALL DW/BI implementations.
NOTE: Another shortcut adopted in several instances is to DROP and CREATE the tables directly from a SELECT statement. This is much faster than than TRUNCATE command. Picking between DROP & CREATE versus TRUNCATE & INSERT is adopted based on scenario at hand though.
NOTE 2: SAP BW handles the Truncate and Load concept in couple of different ways through Process Chains.