In all ETL jobs one would have noticed a step “Delete Target Table Index(s)” just prior to writing data into target data mart or table after all the processing steps are completed.
As a matter of fact, it is one of the “ETL Best Practices” to be adopted while designing ETL data flows. This precursor step has a great performance incentive when adopted in data flows.
In a database, any INDEX created has to be maintained in sorted fashion in order for the INDEX to be effective. Whenever a record is inserted, subsequent to INSERT operation into target tables, the INDEX is re-organized with the new value which was inserted. In case when hundred records are inserted, the INDEX is re-organized or built a hundred times (Note: Depending on the database, index type and DBA actions, this process can be fine tuned). If the target table has multiple INDEXes, the problem is compounded because all indexes have to be adjusted.
In Data Warehousing environments, the quantum of records to be inserted or updated will be in several thousands on daily basis and at this magnitude every performance bottlenecks have to be removed. Dropping the INDEX(s) and loading data helps to a great extent in avoiding costly database operations.
The process while loading data adopted is in the following sequence
- DROP INDEX <index name(s)>
- INSERT INTO TABLE <records>
- CREATE INDEX <column(s)>