Data Virtualization: Here it comes

What is it?
A group of technologies that is enabling us to connect with different sources of data without physically copying from one system to another. A simple example would be to consider a scenario where data is available as excel file and a database system, as represented below. User is unaware of the two disparate systems and is able to seamlessly work with the data.

DataVirtualization_Example

How it works?
It is a simple mechanism and can be visualized as a view that spans across multiple systems.

In a typical database, a view is used to hide the complexities of underlying data model. Cases where complex SQL queries are involved as in the example represented here, the user simply requests data from the view and the database system takes care of actual execution complexity.

What if the same scenario on single database can be replicated across different database systems that are physically present in different hardware platforms and are hosted on diverse operating systems anView_DVd follow different encoding methods? Data virtualization is the answer, where the user (i.e. you) is not exposed to complexities and can request the data in a simple fashion as if all data is available in a single consolidated frame.

Is it different from existing technologies?
Yes to some extent. Database drivers and connectors were provided by different database vendors and one to one communication was never a problem. If data had to be integrated from two disparate systems, an additional tool was required and this need was fulfilled by ETL tools in the market. ETL tools add more functionality apart from fetching data across different sources, which is a separate topic by itself.

Here is a different look at this technology conceptually. Assume that you are traveling to Japan and since japanese is not your native language, the scenarios are as follows.

Before Learning Japanese: Need for interpreter
After Learning Japanese: You know can travel across Tokyo without depending on anyone.

Path breaking or Routine automation?
This is a technology that grew step by step. Different pieces of this technology grew independently in order to give customers a hold in data management. Each technology addressed a specific concern or problem and now they have matured to such an extent to be integrated and embedded within a single database management system. We still have a long way to go in terms of maturity and adoption.

ETL = Dinosaur?
Simple answer is ‘NO’. Data virtualization might be able to replace some basic and rudimentary data flows in the near future. It will take a sizable amount of time to see wide spread adoption. ETL tools play a much wider role than just extracting data from different database systems. It is impossible to replace them from the BI landscape at least for a decade and I believe that replacing ETL is 100% impossible.

Is this reality now?
Yes. Have seen this in action in SAP HANA – SDA concept. Business Objects Data Federator was available in the market for quite some time to offer data virtualization.