ASUG had a webcast this week to introduce concepts of Big Data. David Burdett, SAP, was the speaker.
Quite honestly I had been treating the topic as all “hype” but thanks to participating in a Google Hangout with SAP’s Steve Lucas and Timo Elliott the other week I’ve started paying attention to the topic.
Figure 1: Source SAP
David reviewed Gartner’s definition that “Big data is high volume, velocity, variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”.
Figure 2: Source: SAP
Figure 2 reviews the drivers of data. Last week I listened to another ASUG webcast about how Stanford is using big data for genome analysis.
Figure 3: Source: SAP
Figure 3 covers, when, where should I invest for the most business value? You need to look at governance and how decisions made on this data. You need to care about this data – how good is this data? Is it good enough?
Figure 4: Source: SAP
Figure 4 shows how we got to this point. In the 2000’s it was B2B and B2C. People generate more data, but it is only useful if you analyze it. Starting in the mid 2000’s – mobile, smart phones more popular, and this generated more data. Starting in 2010, with social media, we have more pervasive, more data, useful information. Now, David says, we are in a Big Data World
Figure 5: Source: SAP
Figure 5 shows that 90% of the data today was created in the last 2 years. David said that today we measure data available in zettabyes, which is 1 trillion gigabytes and translates into 57.5B 32 GB iPads.
Social media growth has climbed with mobile phones increasing over 60% in the last 2 years, Facebook has 665M daily active users, Twitter has 228M monthly active users. With social media, customers are talking about your products.
The internet of things with sensor, RFID and telematics also contributes to the growth; David said that devices connected to the Internet are expected to grow 25B by 2015 and 50B by 2020.
Figure 6: Source: SAP
Figure 6 shows on the right there are many types of data not currently in SAP system.
David said that “HANA is complete platform”.
What is going on outside – need to bring things together and this is where SAP is focusing on Big Data
Figure 7: Source: SAP
Figure 7 is from IDC’s market assessment of SAP HANA.
David said that fact-finders look at data, analyze it and use it to make decisions.
Fumblers do it based on gut feel
Figure 8: Source: SAP
Figure 8 shows how the 3 V’s are now 6 V’s, with additional V words
Validity covers governance, that decisions are being made
The sixth one is Value to empower and make it available to end user
David said if you focus on data and not intuition, you can be 20% more competitive, and improve operating margin by 60%
Figure 9: Source: SAP
Structured well defined, easy to understand data stored in a structured database.
Unstructured includes images which can be structured but they are hidden – know when and where.
Semi structured data includes email , as you know who, what when where, but the content is unstructured
Figure 10: Source: SAP
Figure 10 compares SQL vs. noSQL Databases
Figure 11: Source: SAP
Figure 11 reviews Hadoop, with distributing over servers (data nodes) allows Hadoop to scale
It is Open Source software, with any type of data, and multiple different technologies
This is new technology that started 5 or 6 years ago, according to David.
The challenge is how to organize, and how to bring the data together
Question & Answer
Q: Where do you see semantic analysis?
A: Semantic analysis is another way – RDF / semantic web – way to explain data stored in the web – think of doing analysis on semantic data
Google Hangout with Steve Lucas & Timo Elliott: