Skip to Content

Random Ramblings from a Developer – What is Big Data?

Some time back, I started a bit of a blogging journey with this post.  As I could have predicted at the time, it has taken me much longer to find space in my schedule to carry on with the initiative and get another part published but here we go…

BusinessReportAndGrowthGraph.jpgWhat is Big Data?

At the same time, this is both a difficult & complex question along with a very straightforward one.  I’m inclined to suggest a better question should be something like “Why is Big Data suddenly such a Big Thing?”  Of course, as many readers of this post will be all too familiar, our industry is very clever at creating “The Next Big Thing®” that just happens to help sell the latest version of some platform, solution or system…

Etymology of Big Data

Ok, I admit – I only used this as a sub-title to get a nice big word like etymology into one of my ‘blog posts…  Seriously, for those who don’t know what this is please read this Wikipedia entry.

I wanted to try and get an understanding for where and when the term Big Data first crashed into our world – I have a personal recollection but wanted to get a wider view.  A quick Google (just what did we do before Google?) gleans a vast number of thoughts along these lines.  One of my favourites was this post, partly for the main content and (as usual with the internet) partly for the comments.  Interestingly, the author’s rough stab at Big Data hitting mainstream around 2012 isn’t too far from my thoughts (2011 is in my head for some reason.)  I recall it reaching my conscious around that time also, mainly thanks to SAP’s HANA announcements and the increasing momentum the appliance was getting.  It is also relevant to note that we are talking about our current understanding & use of the term Big Data here, and also to recognise that it has been used in a few other ways prior to this point.

The important point here though, is that we are talking about the history of the term Big Data, not that of the data itself.  Or put another way – we have been generating & collecting vast quantities of data for many years prior to every man and his dog rushing to get Big Data or Data Scientist on their CV’s…  What changed?  Why are we suddenly using the term “Big Data” so much, in so many (often) vague ways?

How do you define Big?

Let’s break the term down and think just about the first word for a bit.  I think the word “Big” can be the most misleading aspect of this whole subject.  Having said that, I’m not sure I can think of a suitable alternative.  We often hear size isn’t everything and I believe this relates to Big Data more than many will have you believe.  As usual it comes down to perspective and how you want to measure and compare.  As someone quite famous once said, it is all relative.  We live in an age where data is generated through so many channels, at such an alarming rate, that we probably don’t know what is happening with it all.  Conversely, we also don’t know what other data we could or should be generating and therefore capturing.  Once we’ve generated and captured all of this data, what are we going to do with it?  What happens if we haven’t captured any data that we can actually use?  Ultimately, one of the intended benefits of our current obsession with Big Data and Data Scientists is how they enable us to actually focus down on a specific, highly tailored sub-set of the overall picture – our slice of the pie as it were.  If we don’t have the ingredients for our pie, we will never get a slice of it…

EthanJewett Unknown Unknowns.png

Known Knowns

I spotted an interesting exchange on Twitter not too long ago where  Ethan Jewett was (I think!) trying to make a point about how we capture data.  I didn’t manage to track the whole exchange (some observers might have suggested Ethan was having a drunken conversation with himself!) however I did takeaway some sense of agreement with this tweet.  It really piqued my interest about the whole Big Data thing (as well as helping me to finally make a bit more effort to complete this post.)

All of this got me thinking and it reminded me of an aspect of Quantum Mechanics that I thought was quite appropriate to our current Big Data world and especially Ethan’s comments.  I was idly wondering about how we cannot measure or capture all data and in fact, choosing to measure one aspect of a system could lead us to miss other, important measurements that we actually do need and would find useful.  I’m officially naming this “Jewett’s Data Uncertainty Principle” 🙂

My Slice of the Pie

The challenge for all of our ‘new’ data scientists is how they take all of the data and information available at their fingertips and turn it into something useful.  Just how do we capitalise on the sheer volume of information combined with processing power at our disposal?  At a UKISUG Conference a year or two ago a colleague was speaking to a senior customer representative, who had asked what HANA could do for them – the answer was “what do you want it to do for you?”  I have seen lots of Twitter traffic in recent weeks following a similar vein, where SAP users are struggling to understand what the actual use-cases for HANA can be.  That suggests they don’t understand what Big Data is and more importantly, what it can offer.

This is one of the key challenges with the current state of Big Data, IMHO.  We’ve reached a brave new world where almost anyone can access almost endless amounts of data; they can generate almost endless amounts of data; and then anyone can consume and mash all of this data up into all sorts of random results.  What is the point and where is the value in all of this data?  How do enterprises get value out of this data wrangling?  Are we creating roles for data scientists that are somewhat self-serving?

As a rather pointless example, I discovered LinkedIn InMaps recently and duly generated my network map…  Wow, doesn’t it look impressive with all of my connections there on one screen?


The problem is though, what does it do?  What’s the point?  What value does it create or add?  This is effectively my slice of the much larger LinkedIn data pie but it doesn’t really serve much purpose.  To make it useful, it needs something else added, some extra context.  As soon as you start talking about context in relation to data and information, things start getting interesting fast…

It’s all about Context

I’m pretty sure Vishal Sikka said something along these lines last year some time.  No doubt I have it as a favourite tweet, SCN bookmark, saved to My Pocket…  Ok, I know I’ve got it hidden somewhere anyway.  The point is, often just one element of data on its own is near meaningless but add another element, another dimension and suddenly it becomes valuable and of use.

As a real world example, here in the UK on our motorway network we have overhead gantry signs that display useful information.  Often, on a journey you will see a message such as “To junction 18 – 22 minutes” with the idea that you can then gauge roughly how well the traffic is moving.  However, there is a problem with this.  You are only getting one dimension or measurement.  It’s like a scalar value – it means something but isn’t easy to interpret in isolation.  Now, on some of the overhead signs we have, there is more space and instead you get “To junction 18 – 25 miles, 22 minutes”.  This extra dimension, which turns our data into a vector type value suddenly enables a better interpretation of the information represented.  In your head you can do a rough calculation to determine if the traffic is running at or below the speed limit (70mph in the UK – I base my calculations on 60mph though, which is a mile per minute.)  Now that is useful!

The above example is a clear showcase of how bringing more than once source of data (a constant distance between sign & junction) together with a dynamic source of data (current motorway speed) delivers a compound piece of information that is useful to someone.  Lets extrapolate this example out a bit though into what might happen in future…  What if the sat-nav systems in our cars could tap into this real-time data and perform calculations and decisions accordingly?  Would we see journey times being much more accurately estimated?  If we added in another dimension, such as weather or local events, which we know will impact traffic then we suddenly have a multi-dimensional source to base decisions on.  We are already seeing this sort of technology appearing – I should be taking delivery of a new Audi A6 in a couple of weeks.  Nothing out of the ordinary but it has an 8-speed automatic gearbox and on-line integration with Google maps – this combination allows the car to look ahead and determine if it is worth changing gear.  So, if you are approaching a T-junction in 4th gear, it won’t bother changing up to 5th as it knows you will be slowing again soon and hence it is more economical to hold the current gear ratio.  It might not make a massive difference but consider if each and every single car on the roads was able to do similar things and more by using multi-dimensional decisions?

Commercial Examples

I now regularly attend a JavaScript MeetUp in my hometown of Liverpool.  One of the last sessions was about D3.js and it led me to this website – Sea Level Research. This is another example of how bringing multiple sources of data together and applying some rules and logic can deliver tangible business benefit.  I suspect it could potentially deliver environmental benefits over the long term too.

This area of using multiple sources of data, often from completely unrelated areas, is how I see the Big Data movement moving forward and no doubt how those who have always been close to it have always understood it.  It requires a bit of a stretch in how you understand the word Big though, as you don’t necessarily end up with vast volumes of data but instead maybe vast sources of small, finite information.

SAP Users need to re-think how they are approaching their use of Big Data and indeed HANA.  If it is deployed to simply speed up BI, they have missed the point.  Whilst having your dunning run completed in minutes rather than days is great, where is the value add?  I’m not aware of anyone in the SAP world who is sat staring at their SAP system waiting for a dunning run to complete…  However, I suspect if a financial controller could begin to predict and take proactive, mitigating decisions early in the dunning process with customers based on multiple sources of information, some people will start getting excited.

The Answer?

Finally we get to the end and no doubt you wonder what I think Big Data is?  Well, I don’t imagine it would generate as much interest or excitement if it was called “Multi-Source, Multi-Dimensional, Intelligent, Decision-Making Data” would it? 😉

Image sources

Image Author Link License
BusinessReportAndGrowthGraph.jpg cuteimage
You must be Logged on to comment or reply to a post.
  • My favourite definition of Big Data, by George Dyson:

    "Big data is what happened when the cost of keeping information became less than the cost of throwing it away."

    Or more to the point, less than the cost of deciding to throw it away. Disk space is cheap, the cost of committees to decide whether or not a data set might be needed in the future is huge. And with the capability we have these days to analyse more and more data, we are discovering value in data that previously might have been seen as worthless. That makes deciding to throw data away even harder.

    And yes, there's a positive feedback loop here. The more value we find in large datasets the more data we decide to keep hold of. The more data we have, the more work we feel we need to do to find value in it. 🙂 This effect isn't going away any time soon...


  • Gareth,

    When I think of "Big data," what comes to my mind is big lawsuit, leading to big discovery, big bills, big headaches, and big settlement. Depending on the source of the data, putting a process into place for review and destruction of data deemed to be no longer needed can be time well spent.



  • Hi Gareth

    thanks for an interesting blog.  made me chuckle a couple of times, especially as i've just made the transition from a support analyst to a data analyst.



  • It's as you said: the challenge is to transform data into information and information into value. The 'big data' terminology itself shows that we're looking at the data -> information -> value chain from the wrong perspective...

    Experience shows that these are rather business/industry intensive conversations.

    Technology is a pre-req, but not really the driver here.

    Anyone doing big data as a technology only project is missing the point, and this includes the vast majority of the companies I've approached.

    The old paradigm of IT + business being together for the success of IT projects is even more key when discussing big data (which I'd rather call "big value").

    And then we go to ASUG's survey. I'd be willing to bet those 75% of 55% of the customers that haven't found a business case for HANA haven't really found one for big data in general. And I'd go even further and say that if they haven't found it, it's probably because they weren't seriously looking for it (or listening to their business teams requests with an open mind).

    Best regards,


    • And I'd go even further and say that if they haven't found it, it's probably because they weren't seriously looking for it (or listening to their business teams requests with an open mind).

      I'm not sure about this. Finding a use for big data requires a lot of out-of-the-box thinking from the business teams, and not every company has teams motivated enough to make this leap. Most are, like IT, trying to keep the lights on.

      I do agree that talking about Big Data from an IT perspective leads nowhere, but I don't take it for granted that the business side has already found a use for Big Data which is waiting to be tapped into.

      • I'm sorry if I've led that conclusion. Let me rephrase it.

        It's just with an open cooperation between IT, business and the technology vendors that the companies are gonna be able to uncover the use cases that leverage innovation from a business model perspective (enabled by innovation on the technology perspective). No area has the answer by itself, and that's why it's key to have all parties involved since the beginning. Some of the most successful stories I've seen were uncovered putting people from apparently unrelated areas that in the end of the day proved to be very valuable if working together.

    • Can't speak for every business, but from what I've seen most frequently business struggles with converting X (where X could be HANA, Big Data, etc.) into Profit or another tangible benefit, very much like the Underpants Gnomes with their business plan:


      • Hi Jelena,

        to be honest, that is the experience I usually have when the conversations are restricted to IT only. It's no demagogy, whenever we've been able to involve business from the beginning (without IT "blocking" the access to the business guys) we've been able to uncover business demands that were never discussed with IT because they deemed those cases too "crazy" or "unlikely" to be implemented. Those "crazy things" are usually what stand for the question mark in the cartoon you've shown.