Which difference can Big Data make in biotech? Part I: opportunity marries challenge
Pharma companies need strong partners for R&D
Data and scientific insights are the key to innovation for pharma companies. The need to increase R&D productivity has grown due to the patent cliff, intense global competition, and other reasons. As scientific advances progress rapidly, opportunities are certainly there – but currently paired with challenges.
Complex collaboration networks in R&D and personalized medicine
To be able to bring innovations to market more quickly and to understand earlier if a R&D project should be stopped or continued, the pharma industry collaborates closely with external partners like biotech companies and Contract Research Organizations from all over the world. From a practical perspective, this means that pharma companies need to be able to analyze data from a large number of partners, and from even more possible data sources – which mostly come in different formats. This especially becomes a challenge if historic data needs to be connected with most recent studies. If this hurdle cannot be overcome, potentially important new scientific results stay unexploited.
Not only the complex collaboration networks in R&D lead to massive amounts of data, also the advances in personalized medicine do so. As in personalized medicine, genomic profiles and other individual characteristics analyzed to develop tailored therapies, genomics, transcriptomics and proteomics offer huge opportunities for biotech companies. But the opportunities are tied with challenges from a pure data perspective already.
Owing to data protection laws, data can become available too late, or it can become less meaningful for further investigation of new scientific questions. One example: If you want to find out if and how traits like gender, region, nutrition or genetic preconditions are impacting an illness, and in which case which therapy would have the highest efficacy, you simply need a certain set of information about patients. But data is not allowed to be shown in a way that you can draw conclusions about individual patients. The operative word is “can”: It is not allowed that single non-authorized persons are theoretically able to conclude which patient is behind the information – and this can quickly happen if you for example examine persons with a relatively rare illness in a small defined region.
This requires sophisticated setups protecting information from unauthorized access and still some problems remain touch to tackle.
Varying data quality
Data quality can vary extremely for various reasons. Biggest quality gaps can be found in the age of data, data maintenance, and incorrectness of data that can arise from measuring errors for example. Keeping the whole original data set instead of using pre-aggregated data allows finding and correcting such inconsistencies more easily.
In future, we can expect that data quality will improve more and more. But even if the amount of useful data won’t be the bottleneck for R&D in Life Sciences, the permanently growing amount of data can be still quite challenging.
Growing data volumes
Genome sequencing becomes more and more affordable and even proteomes can be analyzed more and more quickly. Consequently, new correlations can be found faster – e.g. the effect of specific therapies for dedicated genome mutations, which again can provide enormous opportunities to explore complex interrelationships of the human metabolism. Genomes, transcriptomes, proteomes, phenotypes – the amount of data for personalized medicine is growing at breathtaking pace.
Hospitals have a great potential to provide a vast amount of high quality insights about the causes of diseases and about clinical studies. In addition, data can be generated directly from patients through wearable medical devices or other mobile devices like smartphones. With growing acceptance from the patient side, data volumes could explode. To fully take advantage of the scientific power of this data, some hurdles have to be taken. Not only to comply with data protection laws, but also from a technical point of view, companies and hospitals need support to capture the data in a systematic way. In most cases, data is generated in different departments within clinics that mostly work in siloes. As an example, data from the internist division, from oncology, and from rehab are needed to better understand the situation of a patient, data capture alone can be cumbersome.
Biotech companies who will shine with faster and more precise results for R&D in pharma will be the winners of tomorrow. As outlined above, this is also a Big Data play which can be approached in various ways, which we will describe in the next blog “Big Data strategies for biotech companies”.
This blog was written jointly by Emanuel Ziegler and me. I would like to express to thank Emanuel for all his great insights and support! The content of this blog was first published in a shorter version in German in goingpublic.de