The Big Data Visualization Conundrum
This glorious age of big data is creating incredible opportunities for businesses to glean deeper and faster insights for more accurate and timely decision-making, thereby leading to improved customer experience and greater innovation.
Concomitant with this are several challenges. Organisations are overwhelmed by the volume, variety, and velocity (do check out Doug Laney’s original research note on the 3Vs of big data) of the data pouring into and across their operations. Businesses are barely able to store big data, leave alone understand it, or present it meaningfully. Traditional reporting-based BI tools are insufficient to unlock the value that big data represents, partly because they were never designed to analyse semi-structured or unstructured data in the first place.
Data visualization enables organisations to assimilate raw data and present it in a way that generates the most value. I’m proposing 3Cs that good data visualization should empower viewers with – coherence, context, and cognition. (Consequently, I hope that someday I’ll be as famous as Doug Laney! I also thought about correlation and causation, but there seems to be a raging debate regarding the relevance of those two). Pairing big data with data visualization discovery tools empowers business users to be self-reliant and not depend on enterprise IT to mine data, perform ad-hoc analysis, or create one-off reports, for them. Going ahead, this democratisation of BI will serve real-time insights to business users directly, leveraging the growing abundance of mobile devices, and bypassing the conventional batch-processed-reporting route.
Pixelplots, a data visualization technique, are high-density multivariate landscapes of big data that empower the discovery of insights, without any aggregation of data. Simple analytics (bar and pie charts) are easy-to-use (as long as one isn’t using a 3-dimensional pie chart, for example, or using a format that is incongruous to the objective of the presentation) but present highly aggregated data, with a limited number of data values. Pixelplots do have a learning curve (just like Treemaps), but are invaluable when it comes to visualizing the big picture without forfeiting granularity – almost like a multi-focal lens. Their fundamental premise is to represent as many data objects as possible on an electronic display at the same time, by mapping each data object to a pixel. The number of pixels mapped is therefore the number of data objects being considered. Key attributes of a data object can be mapped to its corresponding pixel’s colour, or horizontal and vertical axis ordering.
There has been some academic interest in pixel-oriented visualization techniques in the past, but I am yet to hear about an actual implementation of a Pixelplot in any commercially available data visualization / BI discovery tool. The reason for my fervent interest in this is twofold. Firstly, being a data visualization buff, I am fascinated by how much the Pixelplot actually accomplishes by visualizing a huge set – while simultaneously representing multiple attributes – of data objects. Secondly, I believe that Pixelplots perfectly complement SAP HANA, and they address big data’s “volume” problem more effectively than any other visualization technique in existence today. Keep in mind that almost all analytics on conventional dashboards aggregate, sample, or sort and selectively pick out the data they represent, and never represent the entire data set on a single screen.
Moreover, Pixelplots leverage the ever-increasing pixel densities of modern electronic displays. Apple’s “Retina displays”, for example, already pack up to 5 million pixels into a 15” laptop screen. On regular desktop displays, a Pixelplot measuring just 960 x 600 pixels can represent 576,000 unique data objects. Mobile device pixel densities are typically even higher than desktops, and by this virtue, the Pixelplot is mobile-ready. I am hoping that you are sensing my excitement!
Visualizing Consumer Engagement
To understand Pixeplots better, let’s meet our primary user persona, Cari Smith. Cari is an Online Marketer with a consumer electronics company called Cool Electronics (fictitious). Do note that this use-case for the Pixelplot focusses on Marketing within CRM – based on the choice of KPIs, they can be used in any industry or line of business.
The Consumer Life Cycle
Nate Elliot from Forrester authored this magnificent blog post on the “Marketing RaDaR” , where he presents a powerful alternative to Elias St. Elmo Lewis’ AIDA (Awareness – Interest – Desire – Action) funnel model, which has been used for years as a tool to structure an organisation’s sales. He proposes a model based on a four-stage consumer life cycle (rather than a funnel) – consumers first discover a product or service, then explore it greater detail; next they buy the product or service, and after purchase they engage the company from which they bought, as well as with other consumers. Based on my own interactions with Marketing Analysts (through user interviews while working on a next generation consumer engagement innovations powered by SAP HANA), this resonates perfectly with their mental model and their abstracted perception of their consumer base.
The Top Marketing KPIs
What are the KPIs that are of interest to Cari Smith (our primary persona, just clarifying as I’ve been bandying around several names in this post)? While there are several interesting articles talking about the most important marketing KPIs, Avinash Kaushik’s article lists out a ladder of marketing metrics, with Customer Lifetime Value at the very top. By definition, CLV is the amount of revenue or profit a consumer generates over his or her entire lifetime. To be truly insightful, CLV should not be merely historical (summing up revenue earned from a consumer till date), but be predictive (project how much revenue can be realised from a consumer over their lifetime). As consumers become more digitally networked and businesses move towards a single system of record for all consumer data, another (orthogonal) metric that could add tremendous value is an aggregated social activity score, something like the Klout Score. Another important dimension could be the time spent – how much time have consumers been in a certain lifecycle stage?
Based on all that has been discussed above, here is (finally!) a mock-up of a Pixelplot:
- At the highest level, Cari sees how many consumers are in each of the four life cycle stages. Note that these life cycle stages are customizable – this could be substituted with stages of a customer loyalty program, for example, or need not be a progression at all (simple categories).
- Every pixel represents a unique consumer, and every consumer at any given point is in some stage of the life cycle.
- The colour of every pixel lies along a monotonic (blue) range of shades, and represents Customer Lifetime Value. The darker the shade, the higher the CLV.
- The pixel x-ordering maps to the time spent by consumers in that particular life cycle stage. The farther they are to the right, the longer time they have spent.
- The pixel y-ordering maps to the social score of the consumers – the higher they are, the more social activity they’ve recorded.
- A compact Frequency Distribution of the Customer Lifetime Value (click on a bar in the stacked chart to filter) is available at the top, and the Conversion Rate from one life cycle stage to the other (over a default interval which can be altered) is displayed below the Pixelplot.
- This is a mock-up created using Adobe Illustrator – the Pixelplot may not look this “artistic” actually!
At a glance, Cari can view her entire consumer base and see how they are divided into life cycle stages – this is the big picture. She can instantly identify consumer clusters, for example, those who have a high lifetime value and are more engaged socially – now this is all about pattern recognition, a task we homosapiens naturally excel at (although the machines are catching up!). This kind of insight is very relevant for businesses to satisfy the digitally connected and socially networked consumers of today.
Here is an example of an insight that Cari might astutely glean from the Pixelplot:
“Aha! Here is a large group of consumers who have a high Customer Lifetime Value, are significantly engaged socially, and have been in the Explore phase for a while. I should create a Facebook or Twitter promotion to get them to buy!”
Focus & Context
We talked about the big picture, but the USP of the Pixeplot is that it visualizes data at the atomic level (sans aggregation), in this case, value-by-value, at the individual consumer level. Cari can click on an individual pixel (since this would test anybody’s psycho-motor coordination, as pixels are fast becoming invisible in modern displays, I am proposing a focus+context interaction that converts the cursor into a zoomed-in matrix of 9 x 9 pixels) to go into the details of specific consumers. Our old friend Tom Whitman (from the Teched Demo we ran in 2013) makes a reappearance in the screen below. This ability to instantly drill-down to the atomic level helps Cari plan and run 1:1 marketing campaigns, or simply to understand who some of her typical consumers are:
Filtering the data in the Pixelplot enables Cari to “thin” information effectively. She can choose what KPIs she wants to visualize, and also restrict the data set based on other attributes (demographics, channels, or loyalty). Altering the filters instantly reveals how many consumers match the filtered criteria:
The idea of the Pixelplot isn’t unique, but using it to demarcate market segments through direct manipulation, potentially is! Using algorithms like support vector machines we could automatically discover consumer segments and visualize them graphically, layered atop the Pixelplot. Alternatively, Cari could draw her own segments based on the insights she derives from the Pixelplot, either by using a pointing device or a stylus. Do note that this approach is very different from the traditional (rule-based) methods used to define segments – hence the term “visual segmentation”. Cari sees her entire consumer base in one screen and is also able to identify patterns that either automatically emerge (based on the orthogonal metrics that are simultaneously visualized, like in the example above), or are arrived at by slicing and dicing (using the filters we talked about). For every segment (either suggested or defined), we could surface additional details through microcharts, all for better decision making. Cari could edit segments, or add a title/description for those that she wishes to retain. These are still early days – I am confident that the Pixelplot lends itself to several other exciting possibilities!
This was an introduction to Pixelplots and how they could be applied to visualize big consumer data, conduct analysis through slicing and dicing, and to define market segments visually (and directly!). Like I pointed out earlier, this was just one specific illustration and there is a lot more that can be done with them:
- By incorporating panning and zooming, the Pixelplot can leverage the much-loved design principle of progressive disclosure – zooming in reveals additional levels of detail about consumers progressively, akin to how online map applications (Google Maps, for example) work.
- Using linking and brushing, selecting a certain consumer can reveal others who have similar behaviours and attributes – enabling a “look-alike” discovery of target consumers
A parting note – Pixelplots are not easy to implement, as pixels need to be ordered (they are not positioned absolutely as there might be instances of overlapping) in horizontal and vertical axes simultaneously, which needs a robust rendering algorithm to work efficiently behind the scenes. Also there might be performance issues at the UI layer to render such a vast data set. There is a workaround to this that I can think of – populate the pixels that are likely to correspond to higher / important values first. In the example above, this would mean that the darkest blue pixels (the consumers with the highest Customer Lifetime Value) appear first, followed subsequently by the lighter shades.
Thanks for reading, and do share your feedback and comments. I’d love to hear from you!