What is ‘big data’ and should you care? More than half of organizations are reported as having a strategy to leverage big data to improve their business.
But what is it? Should you care? Computerworld has published Strategy Guide: Big Data, sponsored by EMC, that does what I think is a very good job of explaining what it is and why it is important in simple, non-technical (for the most part) language.
It also touches on a number of the issues that organizations must face as they try to manage and protect all this data. All of this data can be converted into useful information (through business analytics) to run the business better. It enables you to see what is happening now and also to anticipate, in some cases, what is likely to happen in the future.
The data comes not only from traditional sources, such as enterprise applications, but new sources, such as social media. The data can also be mined for risk and assurance purposes, such as:
- Risk monitoring
- Continuous monitoring
- Continuous auditing
- Fraud detection
- Information security risk assessment
The piece that is missing from the Computerworld study is the problem of speed. These analytics need to run against perhaps billions of records every day (or more frequently). How can you do that with traditional technology, where the analytics will take pretty much all day to run?
For example, I know of a California bank that wants to analyze all its ATM transactions every day, but the report takes 9 hours to run because there are 1.8 billion transactions to sift through.
There’s a retail grocery store company in Europe that wants to compare its inventory levels against sales for every item in every store, and adjust prices continuously to attract customers, optimize revenue, and avoid excess inventory. How can it do that with traditional technology?
What about the financial services institution that wants to monitor social media and other activity (within its corporate systems, for privacy reasons) to identify where there is a risk of loss of key personnel. Again, we are talking about massive volumes of data.
A relatively new and exciting possibility is ‘in-memory’ computing. The general idea is that the data to be analyzed is not stored in a traditional data warehouse, but in-memory.
This is much faster to analyze and organizations can run a report that used to take 9 hours in just a few seconds. Oracle has reported that reports can be run up to 50,000 times faster and SAP (my employer) has experienced even faster responses. (Click on the links above for more information). Questions you might want to consider asking include:
- Does the organization have a strategy to leverage big data? What are the goals and timelines? If not, why not? Do we understand the potential?
- Are we ready to manage the explosion of data? Do we have the necessary resources and expertise?
- Have we partnered with software vendors and consultants so we can obtain acquire the right solutions to optimize the benefit at an acceptable cost and in good time?
- Can we protect and secure the data that is stored in or passes through our network?
- IDC points out that 80% of all the data created in the world, including data created by individuals on social media, is on corporate networks at some point in its life?
- Do we have a reasonable level of controls to manage the risks of incomplete or unreliable data? Are the appropriate security, risk, and assurance personnel involved to guide the big data initiatives?
- Has the potential to use this for continuous monitoring/auditing been considered?
I welcome your thoughts and experiences.