This post gives a quick introduction to MDM in today’s environment.
MDM or Master Data Management as it is referred in industry parlance is a set of tools, processes and methodologies adopted in order to provide a single and unified version of master data across the organization.
Master Data represents the core data set that an organization captures and uses for everyday transaction purposes. Any object or entity that is identified as critical for the operations of a company needs to be handled carefully because organizations today depend on data to provide insights into performance and future direction. Without much controls around the data, problems crept into the data that made reporting an arduous task.
Data Governance & Data Quality
Data fragmentation was the first problem that became apparent. This was not showing the complete 360 degree picture of a business event or transaction. As the number of transactional systems increased in a landscape, so did the different versions of same data. Each application maintained different standards and nomenclature for master data. Additionally users entering data in the systems were unaware of the impact beyond their scope of vision.
In scenarios when a single system was implemented organization wide, different divisions and business units had different versions of the same data. Fragmentation slowly creeps into the data and before it can be realized the quality of data becomes questionable. In complex environments, it becomes harder to track back and correct the wrong data and it is up to the BI system to accommodate such variants in the design.
It is apparent now that data quality was the primary reason for MDM adoption. Since BI platform aims at providing single source of truth for all data, the burden of conforming the data according to standards falls on to ETL tools. Cleansing the data happens during every data load, which can consume anytime between 30% – 40% of total time spent on data loads. As the data volume and number of applications grow, the task of cleaning and conforming data becomes increasingly cumbersome. In several instances, manual intervention might be required for trouble shooting in case of any data load issues. Since the source system is holding bad data and for every data load, the entire cleaning process has to be repeated, this slowly builds overheads and leads to performance problems.
Data propagation is the process by which data is shared across multiple applications in a complex and diverse landscape. For example, consider a CRM system that is used by sales team to feed orders from customers from the field. Assume that these orders are then transferred from CRM to an ERP system end of every day in order for invoice to be created and maintaining accounts. Going with the example above, if the ERP system has “Walmart Inc.” and CRM system is providing data as “Wal-mart”, the ERP system assumes that it is a new customer and processes accordingly. These issues, later become evident in the BI system when data from CRM and ERP are consolidated.
Synchronization & Time
Once bad data has been identified, the onus is on the source system to rectify the issue. Time to synchronize changes is a lengthy process because the user has to go back and fix historical information, which might be just couple of data records or a few hundred. Complexity arises when the data has been propagated to a different application, where more coordination is required and until then bad data is part of the BI landscape.
NOTE: These are some of the most frequently encountered issues that plague large and diverse application landscapes that have a decentralized management. Since this article looks at MDM from BI perspective, some factors are not dealt in detail.