From the introductory post on information, it is evident that “information” is a collection of semantically meaningful data. Data from base layer is transformed based on logic or rules into understandable format. We can look at this as a stage where the disorganized data gets a structure. One important thing to notice here is that we need more data to describe data. If this statement is making you tizzy, just be aware of this quirk and as we move on, the murkiness will slowly disappear.
Data -> Information, is the primary stage where data gets classified, collated, transformed and organized for further analysis. Without this process, everything will look like noise or static chatter. This transition typically cannot be achieved in a single step, instead it is an iterative process were analysis and learning go hand in hand. A feedback mechanism is typically implemented that will validate our understanding at each step and ensure that we move forward in the right direction.
At times when factors such as source of data, format of data, back end application, process logic or data collection methodology are available, it is easier to derive the structure of data. It is not uncommon to encounter scenarios where one has to analyze the data blindfolded. In these cases, the iterative approach will be of immense help. The learning process starts by grouping together small pieces at a time and the validation & feedback mechanism will ensure that the initiative stays in the right track.
In majority of scenarios, the quantum of data available is restricted at source, but for analysis purposes this might not be enough. In order to understand the available data completely, it becomes imperative to deduce and add more data during the process of creating information. For example, let us say that a hospital provides patient admission date and discharge date in its source data. If one has to understand the length of a patient’s stay, the source data alone is incomplete. This has to be derived by finding the difference between two dates, which are available in source data. This process of deducing data is one important characteristic of “Information” phase.
Deriving additional values are not restricted and we can derive multiple values as long as it does not impact the consistency and integrity of source data.