Future of the “Data Warehouse”


In today’s rapidly changing environment, organizations have adopted “Data First” strategy wherein they started to rely on data for gaining competitive edge in the marketplace. People have warmed up to the idea of self analysis which is evident by the fact that self service BI is one of the top game changers for 2017 and beyond. It gives a perception that, a structured and methodical implementation of Data Warehouse solution is an unnecessary step towards achieving analytics agility.

The term Data Warehouse refers to the activities related to creating a data model and implementing all associated ETL logic to keep data up to date for analysis needs and maintaining the system on daily basis.

In this context, the concept of data modelling refers to both Semantic Modelling as well as Physical Modelling.

Proponents of this model look for a completely ad-hoc analytical solution that involves minimal effort from IT teams and full independence to perform analysis on all available data. A popular terms for this approach is Democratizing Data, wherein business users control ownership of actual data and technology teams focus only on technical implementation tasks. This approach implies limited scope for design and maintenance of structured reporting solutions such as formatted reports or dashboards.

Although this might sound like it is sunset years, structured implementation of reporting solutions will play an important role in certain use cases. Business users will rely on purpose built data warehouse for standard and periodically reported KPI’s. One key factor for adopting purpose built data marts is the need for high performance.

Research has shown that users will lose interest if they wait for data to load for more than 3 seconds (The Pi factor)

It is almost impossible to bring down data refresh latency to three seconds or below if all the business processing logic has to be performed every user action. Large  section of of business user community are just information consumers who predominantly slice and dice data for day to day operational needs. Structured physical data marts are the best in terms of performance to address this scenario.

Semantic models will face increasing usage for various scenarios with analyst user community. At times, semantic models can be also used for structured data mart implementation, which is just a factor of underlying physical data model.  Semantic models just provide the relationship and definitions of underlying data models, thereby giving users a clean slate to build analysis scenarios.

Data scientists on the other hand will have free run for performing analysis on any data set with direct access to raw data, including data marts and semantic  meta data layers.

Physical ModelsPhysical tables created that conform with OLAP schema. Data is materialized in all relevant tables and updated on daily basis with latest values from transaction systems.80% of reporting requirements 30% of reporting requirements focusing on standardized KPI's
Mixed Semantic Models Predominantly have an OLAP underlying model as well as traits of the original transactional system. At times, data federation is embedded.10% - 20%20% of reporting used by analysts and power users
Direct Semantic Models Semantic data layer built directly on source data with minimal customization. Semantic layer may hold few customizations that are in force. At times, data federation is embedded.10%. may be even lesserUp to 25% of reporting needs for analysts and data scientists
Direct Access to Raw Source DataNegligibleUp to 25% of reporting needs with predominant requirements from Analytics and Data Science teams,

BI Developers will predominantly focus on Data Integration space with the number of front end developers decreasing. Roles shredded in this area will not be replaced, instead more tech savvy analysts with techno functional skill sets will come into vogue.