Does Clinical Data Qualify as “Big Data”?

I was at an Analyst conference last week where I met a couple of analysts (no pun intended :-) ) focused on Life Sciences who felt that “Big Data” is a tough sell in Life Sciences, except for Genomic Data. That made me think. I always associated “Big Data” with the size of the data sets running into Peta Bytes and Zetta Bytes. What I learned in my journey since then is that the characteristics of Big Data does not start and end with the Size.

This article on Mike 2.0 blog by Mr. Robert Hillard, a Deloitte Principal and an author, titled “It’s time for a new definition of big data” talks about why Big Data does not mean “datasets that grow so large that they become awkward to work with using on-hand database management tools” as defined by Wikipedia. He goes on to illustrate three different ways that data could be be considered “Big Data”. For more, please read the blog.

One quality he explained that is of interest to me is “the number of independent data sources, each with the potential to interact”. Why is it of interest to me? I think Clinical Data, in the larger context of Research & Development, Commercialization and Post Marketing Surveillance definitely fits this definition. As explained in one of my previous posts title “Can Clinical Data Integration on the Cloud be a reality?“, I explain the diversity of clinical data in the R&D context. Now imagine including the other data sources like longitudinal data (EMR/EHR, Claims etc.), Social Media, Pharmacovigilance so on and so forth, the complexity increases exponentially. Initiatives like Observational Medical Outcomes Partnership (OMOP) have already proven that there is value in looking into data other than the data that is collected through the controlled clinical trial process. Same thing applies to some of the initiatives going on with various sponsors and other organizations in terms of making meaningful use of data from social media and other sources. You might be interested in my other post titled “Social Media, Literature Search, Sponsor Websites – A Safety Source Data Integration Approach” to learn more about such approaches that are being actively pursued by some sponsors.

All in all, I think that the complexities involved in making sense of disparate data sets from multiple sources and analyzing them to make meaningful analysis and ensure the risks of medicinal products outweigh the benefits will definitely qualify Clinical Data as “Big Data”. Having said that, do I think that organizations would be after this any time soon? My answer would be NO. Why? The industry is still in the process of warming up to the idea. Also, Life Sciences organizations being very conservative, specially when dealing with Clinical Data which is considered Intellectual Property as well as all the compliance and regulatory requirements that goes with the domain, it is going to be a long time before it is adopted. This article titled “How to Be Ready for Big Data” by Mr. Thor Olavsrud on CIO.com website outlines the current readiness and roadmap for adoption by the industry in general.

The next couple of years will see evolution of tools and technology surrounding ”Big Data” and definitely help organizations evolve their strategies which in turn will result in the uptick in adoption.

As always your feedback and comments are welcome.