Additional Blogs by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

Ideas are living, breathing beasts, they spread virally like a game of Chinese whispers as tweets speed into re-tweets, into blogs, articles and then eventually into the psyche of us, the people.  In today’s world it doesn’t take long to become the accepted ‘norm’.  This is essentially what has happened around the topic of Big Data. Now the first truth we need to accept is that Big Data isn’t a new problem, of course not, Big Data has been around since we invented number systems.  It would be pretty challenging, 5000 years ago in Sumeria, to calculate the total expenditure of the Babylonian government on your abacus – that was a big data issue.  To get a perspective of the Big Data discussion in more modern times I would recommend a brief scan though the Forbes article by @GilPres “A Very Short History of Big Data” http://onforb.es/1fRelUN

What interests me on the topic is how the emergence of the 3 V’s (Volume, Velocity and Variety) took hold as the accepted norm.  This was a great discussion back in 2001 by industry analyst Doug Laney (@Doug_Laney) that started to really define and categorise the challenge of Big Data. Following this many extensions to the 3 V’s started to take hold adding dimensions such as Variability, Veracity and Complexity.  There are many definitions out there of the V’s of Big Data all of which give a similar perspective, I take the following very basic excerpts from our old friend Wikipedia (http://bit.ly/1vGCYNw😞


Volume – The quantity of data that is generated … it is the size of the data which determines the value and potential of the data…


Variety – … the category to which Big Data belongs ..


Velocity – … the speed of generation of data …


Variability – … this refers to the inconsistency which can be shown by the data at times ...


Veracity – the quality of the data being captured ...


Complexity – Data management can become a very complex process, especially when large volumes of data come from multiple sources. These data need to be linked, connected and correlated ...

When I look at these definitions I can’t help but feel we are missing something, that’s the Vastness of the data.  Volume only ever seems to focus on the number of records and the exponential growth of data – some evangelists talk of Moore’s Law for Big Data.

Roughly speaking Moore's law says that every two years, computer capacity doubles, capacity being speed, memory etc. http://bit.ly/18b44oa

Variety tends to be used to explain different types of data source, structured, unstructured and social as the normal examples and Variability to the changing nature of the data but where does the consideration for the increasing associated data to a record come into play?  This is what I mean when I talk

about Vastness.  This shouldn't come as a surprise to many of us who have been dealing with Vastness for many years to make sense of complex problems - but it seems to have got lost in the noise as we focus on the more popular people at the party, namely Volume and Variety.

Let’s take a simple example relating to a customers’ propensity to purchase and individual targeted marketing.  We have all been on the end of these techniques where companies attempt to interpret our buying behaviours to send us specific offers that are tailored to our likes and dislikes.  Thank goodness I say as I was really getting tired of being offered 10c off coffee when, being a traditional English chap, I only ever drink tea.  It makes sense that we apply some heuristics to the offers we make otherwise we devalue our brand and alienate our customer.  I get it!

To support these targeted offers our systems of record capture more and more information - it isn’t enough to capture basic information about a customer in our CRM system any more, we now want to bring into play whether they have a cat and whether they use their credit card to purchase take away food.  This brings into play the question of Vastness – it isn’t about the volume in a traditional ‘row-based’ manner but in a ‘column-based’ manner.  To perform really detailed accurate analysis of this we need to take into account the breadth of the data and the number of columns – thousands, tens of thousands or even hundreds of thousands of columns that identify the customer.

So, when we are thinking about Big Data let’s not forget to understand the Vastness of the data too and consider how we handle the complexity that this brings to be able to extract real value.  As the market continues to focus on the popular aspects of Big Data let's not lose sight that being able to handle the Vastness could be the real differentiator to our business in this competitive world.