Big Data Myths…BUSTED

Since Big Data is such a “big” and diverse topic, there are plenty of assumptions, misunderstandings, and confusion surrounding the concept. Through discussions with colleagues, I’ve noticed a few recurring themes, uncertainties and generalizations. I’ve also heard plenty of assumptions and claims…some of which are true and some lack validity. After hearing from industry experts, I’ve compiled three Big Data Myths, which we will bust, to uncover some of the Big Data facts.

Myth One: Big Data is really “BIG.”

When it is stated that big data is really big, that infers that big data translates into a lot of information, more than most companies collect. In that case, this myth is not true at all. At the latest Hadoop conference, big data was defined as any data that cannot fit into Excel, which in reality, is the amount of data most companies collect and store, especially since the rise in social media and the volumes of data collected from those sources.

“Social media releases floods of data that hold massive promises for business – both in terms of branding and opening up new channels to market, in which large populations of consumers are speaking. They’re [consumers] communicating who they are and what they do and do not like. Such a relentless flow of data provides extraordinary levels of feedback and an unrivaled chance for companies to listen to the voices of their customers and target audience(s), garner intelligence, and participate in a collaborative dialogue to boost competitive advantage and new opportunity,” explains Upasna Gautam of MagicLogix.com. [Stay tuned for another post about the link between social media and big data.] Most businesses are using social media outlets for this type of insight and the amount of data collected and needed to be analyzed is considered “big.”

So it isn’t just enterprises who are struggling to manage big data. The general rule – if you have multiple spreadsheets with data, you have big data on your hands. As Margaret Dawson, VP of Marketing for Symform points out, “SMBs also struggle to keep up with skyrocketing data volumes. In fact, a recent data and backup trends survey of SMBs found that respondents average one terabyte to over 500 TBs (1 TB = 1000 GB) of data, with most forecasting data growth of 10-40% over the next year.” Wow.

Myth Two: Big Data makes BETTER analytics.

Is bigger really better? Sometimes, but in the case of big data, it seems that in certain circumstances, bigger is simply bigger and quantity does not always equal quality; the gap occurs in the analysis and translation of the data. It’s [almost] comparable to giving someone a hammer, nails, wood, and all the tools they need to build a house, but without the blueprint, they don’t know what to do and where to start. This means that with our data, we have to be sure we are collecting the valuable data to help solve the most prominent problems – the problems that relate back to our KPIs and bottomlines – and following the right blueprint to get what we need.

Erin Bartolo, Data Science Program Manager at the School of Information Studies at Syracuse University, agrees and provides a strategy to ensure bigger data turns into something meaningful. “Entertain your inner skeptic by questioning everything from what data are meaningful to how you project your own biases on findings. Without objective, analytical skills, analytics merely backs up our own biases with data,” she advises. She explains that the “whys” and “hows” need to be infused into the data analysis to really find value. She says, “…increasing one’s awareness of data and appreciation of its objectivity reveals insights whether the data is stored in an Excel spreadsheet or in a massive data warehouse.”

So in this case, size doesn’t really matter, unless you need the size of the data to answer the questions that relate back to your ultimate goals.

Myth Three: You need a team of Hadoop engineers and Analytics platforms to be on premise to work with Big Data.

While it is quite a challenge to merge data collected from various sources and analyze the information (and Hadoop professionals could be an advantage), there are other solutions and platforms that can help transform the unstructured data into structured data and merge with business intelligence (BI) tools.

Werner Hopf, CEO, Dolphin believes, “There are compelling [software] solutions to help companies meet those goals, achieve significant savings and performance improvements, and lay the foundation for leveraging SAP HANA – the vehicle for truly maximizing the potential benefits of big data – in the future.” These options can be cost effective and easily navigated by an intelligent, but not expert users.

The other idea of housing analytics platforms on premise is busted by Keith Metcalfe, Vice President of Sales and Marketing at WCI Consulting as he adds, “Integrating and cleansing data to a targeted place for reporting is a core concept behind any enterprise approach to analytics/business intelligence, and there is no technical reason why that target cannot reside in a hosted/SaaS environment. Cloud platforms and analytics tools are great applications for hosted (e.g. Amazon Web Services) or SaaS analytics platforms (e.g. SAP BusinessObjects BI OnDemand). Having said this, core to the topic of SaaS and hosted environments is that an organization sees value in replacing IT infrastructure, as this is where the financial return justifies the cost of investing in such environments.”

Two last pieces of big data management advice from Hopf, “From a data management perspective, making the most of the big data opportunity requires the adoption of two key strategies: 1) augmenting data archiving capabilities with nearline storage; and 2) re-architecting the business warehouse (BW) data model for lean, flexible, organized “views” of information that serve up agile reporting without increasing administrative
overhead.” Do this, add the human aspect, and solve big business problems using your big data.

If you have questions about these myths, feel free to reach out to a Top Big Data Twitter Influencers.

  • http://www.facebook.com/jkobielus James Kobielus

    Very thought-provoking, Jen. There are plenty of Big Data
    myths floating around. We simply need to deflate them with facts. Let me add a
    few additional thoughts re each myth you discussed:

     

    Myth #1: Big Data is really “BIG.” I
    like to think of Big Data as, at heart, “massively scalable
    analytics.” In other words, it’s more about having headroom to scale your
    analytics into the petabytes, into real-time streaming, and into
    multistructured data territories. I
    also like to think of the “scalable” part of it in terms of
    thresholds beyond the usual: beyond low-terabyte volumes, beyond batch ingest
    and delivery, and beyond unistructured relational data varieties. It’s not
    about “big” in any absolute way, but “how scalable does your
    analytics platform need to be now and over the next several years?” What
    companies of all sizes are starting to realize this that they’ll need to scale
    to one or more of the “Vs” sooner than they realize. Are you doing
    the architectural planning, and do you have the right platform(s), to provision
    more storage, memory, processing, and bandwidth rapidly and cost-effectively when
    Vs start to bear down on you?

     

    Myth #2: Big Data makes BETTER analytics. No,
    of course not. But, as I’ve said elsewhere, Big
    Data enables the new paradigm of “whole-population analytics.” This
    involves having the entire population of analytic data to drill into, rather
    than just the traditional capacity-constrained samples/subsets. Being able to capture,
    aggregate, mine, model, manipulate, search, query, and visualize the entire population
    of any data set can give you fresh insights. For example, having a 360-degree
    deep-historical customer view, including rich real-time behavioral data,
    enables you to do more powerful micro-segmentation, fine-grained target
    marketing, nuanced customer experience optimization, and agile next best action.

     

    Myth #3: You need a team of Hadoop engineers
    and Analytics platforms to be on premise to work with Big Data. On-premises?
    That’s not always necessary or prudent. One of
    the exciting things about the Big Data revolution is the growing range of
    outsourced, hosted, and multitenant cloud/SaaS offerings. Likewise, a growing
    range of consulting and professional services are helping users to bootstrap
    their internal competencies. You don’t need to do it all in-house. You can bring
    in the best and brightest data scientists to help on mission-critical Big Data projects
    that involve Hadoop, NoSQL, MPP EDW, graph databases, and other platforms.
     

    • Jen Cohen

       Hi James. Thanks so much for your added insights. I appreciate your commentary and the supportive information to further inform readers – especially about data providing that 360 degree snapshot. Great stuff!

  • Pingback: Big Data Myths…BUSTED. | Innovation | Complex Insight - Understanding our world | Scoop.it()

  • Pingback: The Esperient-PTAV Blog | Big Data Myths…BUSTED. | Innovation()

  • Pingback: Cloud Computing Myths…BUSTED. | Innovation()

  • Pingback: How C-Level Execs are Tapping into Big Data Analytics | Innovation()

  • Pingback: What’s the big idea? « Linnettaylor’s Weblog()