Successful adoption of BI projects heavily depends on Performance of the end product delivered to the final consumer. From user perspective, time taken to fetch or refresh data should be lightning fast. Generally accepted parameter for this aspect is “5 seconds or less“ for each turnaround, wherein the term turnaround refers to either a report loaded afresh or an action from user’s end that might refresh the data. Although this might seem to be a monumental task, developers have many tools under their arsenal to improve performance.
One of the fundamental concepts related to performance that every working associate in BI domain should be aware is the difference between SCAN and SEEK. The simple logic behind them is very simple
|SCAN||BAD||All rows in the table are accessed|
|SEEK||GOOD||Only selected rows are accessed|
Every data fetch request (i.e. SQL SELECT statement) is first analyzed by the database management system’s optimizer. Analysis results in a flow chart, referred to as explain plan in database parlance that has all the actions to be taken such as
- List of Tables to be accessed
- Records to be fetched
- How records are to be fetched (i.e. access method)
- Filtering criteria
- Order of JOIN actions to be performed
- Calculations and matching
- Intermediate materialization’s
SCAN and SEEK play critical role in the second step wherein the actual data is accessed from individual tables. Scan operation iterates over every record in the table and the time taken is directly proportional to volume of records in the table. Database optimizer picks the access method based on various conditions and some of them are
- Number of records
- Invalid or unusable filter criteria (selectivity)
- Non availability of index
- Stale database statistics
Seek operation on other hand fetches only the required records for further processing and thereby increasing the speed of query execution.
All relational database management systems split the data into small and manageable chunks called blocks, which are typically around 4kb in size. A record might span multiple blocks too and in that case SCAN operation has to access all blocks whereas a SEEK method will skim over unwanted blocks. This will result in smaller quantum of data transferred for further processing to the application, thereby reducing data transfer cost and time.