Big Data 101: What is Big Data, explained with Sentiment Intelligence
A few weeks ago, at a Customer’s event, I was approached by a User about Big Data and how difficult was for him to understand it. I came up with an approach that I think it would be very useful for anyone to explain or understand it very easily.
Sentiment Intelligence is a very interesting solution from SAP to capture, store and analyze social media, and get insight from Big Data. Look at this <4 minutes youtube video: https://www.youtube.com/watch?v=ERcy0YyHmts
Let’s double click on Sentiment Intelligence and how this is a Big Data solution.
Sentiment Intelligence can turn Tweets into insight by capturing, storing and analyzing them and categorizing each and every one as: Very Positive / Positive / Neutral / Negative / Very Negative
Capturing Big Data
How does Sentiment Intelligence can capture Tweets? By reading Tweeter service capturing as they are created (Real time). Most likely you have heard “Big Data’s Three V’s” https://en.wikipedia.org/wiki/Big_data or even more V’s. Here you can understand one side of first V: “Volume”. It is easy to see Tweets as a big volume data collection.
Here comes the second V: “Velocity”. Do you thinks tweets being created is fast?. Transactions coming at a very rapid pace it is something many companies of many industries face. Imagine transactions happening at a retail store chain, or calls at a phone company.
Sentiment intelligence, then stores Tweets in a SAP HANA Database. Here comes “Volume” again. Being able to store this much information at such fast pace is a Big Data problem or, in this case, an ability.
How to access or capture data and then store it (if needed) is an Architecture challenge.
Processing Big Data
Every Tweet is a 140 characters long text. How does Sentiment Intelligence can categorize this text as Very Positive, Positive, Neutral, Negative or Very Negative? It reads them!
Text analysis is the process of analyzing unstructured text, extracting relevant information and then transforming that information into structured information that can be leveraged in different ways. (http://scn.sap.com/docs/DOC-54094 )
Here is another buzzword very associated to Big Data: unstructured data. Text is one example of unstructured data, as it has no pre-defined model or structure. Fields in a Database table or tags in an XML file are structured as we know what they are or mean by its position in the underneath structure. Tweets does not have any structure.
SAP HANA has Text analysis functions that “reads” the Tweets, looking for key words and phrases to capture text’s sentiment. This technology has such computing power that can process every Tweet, categorized it and store this high volume and high velocity information into a SAP HANA Database table. It takes a free text string and turn it into a discrete value in a database table. It turned unstructured data into Structure data. This is the key process in Big Data Scenarios. Every scenario will require a very specific way to turn unstructured data into structure data.
Again “Volume” and “Velocity” V’s show up, but now third V’s is also showing its face: Variety. This last V is all about how source data is stored or where it resides. A Tweet in a web service, text in a document, etc. Al kind of files or signals contains info that needs to be processed in order to get information from it.
Thanks to SAP HANA’s high computing power and Text Analysis functions, this scenario is possible.
Analyzing Big Data
Text is unstructured and thus traditional OLAP functions does not apply, as they run on numeric values on table’s numeric fields.
Now that structured data is available, traditional BI tools can be used to get the insights we are looking for. But once again First and Second V’s show up. Imagine reporting and analyzing on a very big (millions or billions of records) and changing (more records coming in every second) DB table. A high computer power is needed to read and summarize all this data. If you also need OLAP functionality, even more computer power is needed.
Columns, Bars, lines charts are as useful as they has always being. New visualizations like Geo, Choropleth, Heat, Tree, Network, Tag Cloud, combinations and others are to be used on specific use cases. You can try these new visualizations with SAP BusinessObjects Lumira.
Let’s see Gartner’s Big Data Definition: “… high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” Does it make more sense now?
I hope you enjoy. Please, let me know what you think.