Most of the conversations I hear and articles I have seen about big data are focused on business analysis and innovation. Know more about your customers and the products and services they are looking for and you can positively impact revenue, margin and market share. These are great reasons to head down the big data path. However, to realize business value from big data companies need to have strong information governance and few people seem to be talking about this.
To enable analysis and innovation opportunities with big data, companies need to integrate information from multiple data sources. And this data will included both structured and unstructured data which means text processing and entity extraction capabilities will need to be part of the data integration infrastructure. It will also be important for companies to look for data integration solutions that have already done the integrations and optimizations to Hadoop and Map-Reduce so IT does not have to be experts in these areas.
Incompatible standards and formats of data in different sources can prevent the integration of data and the more sophisticated analytics that create value from big data. According to the Aberdeen Group (Data Management for BI: Fueling the analytical engine with high-octane information) Best-in-Class companies take 12 days on average to integrate new data sources into their analytical systems; Industry average companies 60 days; and Laggards 143 days. And part of the large gap between best and worse is due to differences in the use of data quality technology and processes. So companies will need data profiling, cleansing, and meta/master data management capabilities to realize the full business value of big data.
While it might seem like keeping every bit of data is the best approach the cost makes it impractical. According to a McKinsey Global Institute study (Big data: The next frontier for innovation, competition and productivity) the projected growth in global data generated per year is 40%. So companies will need to balance long term retention with cost and legal requirements. This means companies are going to have to determine what primary data needs to be kept live, what can be moved to an online archive and what their policies are around retention and disposal.
Yes big data is a big business opportunity, but the business value won’t be realized if the information isn’t governed. How can you use big data to develop the next generation of products and services if you can’t do text processing to perform market/brand/sentiment analysis? How can you use big data to build predictive models if can’t cleanse data and reconcile formats and meaning between your data and external data? How can you improve your bottom line if the additional revenue you are generating from big data is being eaten up by your IT capital and operating expenditures to store and manage data?
Information governance in the context of big data is a topic that companies can’t afford to ignore. It can no longer be the elephant in the big data room no one is talking about. If you’d like to learn more about information governance SAP TechEd in Las Vegas, October 15th -19th is offering a number of relevant sessions including;
Hands On Workshops
Real World Implementations
If you can’t make it to SAP TechEd you might be interested in listening to the Sapphire Now 2012 Information Governance Panel discussion with Bio-Rad, Colgate-Palmolive and Sysco Foods, or the Sapphire Now 2012 Big Data Panel discussion with Bayer, Valero and The Globe and Mail. These companies discuss how corporate decision making is impacted by both the volume and quality of their data, and how SAP has helped them implement information governance.