Granularity and Cardinality

http://tips4ufromsony.blogspot.in/

In a DW/BI implementation project, after the first round of Requirements gathering from users is complete, sketching the data model is started. Having a well constructed data model during early stages of the project is critical because the ETL, DBA and Reporting teams depend on it for their deliverables.

During this phase of design, the data analyst/architect identifies Strong & Weak entities in order to build dimension tables. In case of standardized OLTP systems such as Enterprise Resource Planning, Customer Relationship Management from well known vendors, this task is easier to accomplish due to the nature of common design adopted across multiple vendors. While building a data model, all user requirements must be met without many iterations heading back to the drawing board.

Cardinality and Granularity play an important role in defining and designing dimensions. These affect the outcome of Star or Snowflake schema adopted as well as other design aspects such as Dimension Conformity, Outrigger Dimensions and Degenerative Dimensions.

Cardinality: This refers to the number of unique items within a field in a table. For example, Employee is an entity for which Gender is a low cardinality item, whereas Tax Identification Number is a high cardinality item. Gender When looked from perspective of table as a whole, adding low or high cardinality fields will impact the size, but it is the analytic need that should dictate inclusion of an attribute. One can observe that aggregated queries are much faster while executed on low cardinality attributes.

One key functionality of cardinality is that it helps in defining and relationships i.e. One to One, One to Many etc across entities.

Granularity: Data grain refers to the amount of detail with respect to an entity. consider the Employee example seen previously, if the data model captures only Employee ID, Name, Department, Manager & Salary it is a dimension with low grain. If the dimension contains Employee ID, First Name, Middle Name, Last Name, Suffix, Salutation, Gender, Marital Status, No of Dependents, Designation, Department, Manager, Salary, Tax ID etc, is a high grain dimension.

In conclusion, both concepts are intertwined during implementation. Since both refer to quantum of data, it conveys an image of interrelated concepts. In reality, both are independent concepts and play different roles while designing data models.