Columnar Storage: Impact in BI (Part 3)

drdivaphd.wordpress.com

Link to the series Part 1, Part 2

heliocor.com Big Data / NoSQL: Columnar storage format has many success stories and finds its spot within BigData / NoSQL databases. There are different types of NoSQL databases, and one method is called “Column Family“. The exact technical implementation slightly varies from what was described previously. Conceptually, as the name suggests, when data is loaded into a “Column Family” database, it is stripped by columns and then distributed across the cluster. NoSQL databases can handle very large volumes of data and are used exclusively in big data analytical applications. Apache Cassandra is a good example of this type of database.

myiconfinder.comData Write: Writing data to a column store is highly inefficient, which is predominantly due to the layout of data. When compared with Row Store mechanism, an Insert or Update is just one operation, whereas in Column Store the number of operations is equal to the number of columns defined in the table. Different mechanisms have been implemented to overcome this and one popular method is to use Delta storage and Main storage. This gap is slowly reducing now with and now, Columnar databases are be even ACID compliant.

One great resource that goes in depth with actual analysis of performance of a data warehouse with Column database using classic Star Schema model is available in MIT website. Today, all database vendors such as Oracle, Microsoft, IBM, MySQL, SAP etc., provide a default option to choose the storage type while creating tables. Migrating from row to column based storage does not consume many resources. Entire project can be completed within a matter of weeks without slightest impact to existing reporting structure.

This is a clear indication that Column storage is ready for prime time usage and several organizations have successfully implemented large scale data warehouses in production environment.