It is hard to pinpoint a perfect definition for big data. The phrase vaguely summarizes an idea of all the data available on the internet. Actionable data as we witness it today is hardly a palpable static entity. It is relentlessly changing in terms of volume, variety, veracity and value at a mind numbing velocity. In synchrony, the technology and methods applied in handling and storage of data are also changing drastically. Scientists and intellectuals have classified this huge amount of data as well as the tools and techniques employed for collecting, storing, cleaning, and analyzing it as “Big data”.
We have seen the maturity of big data from being a buzz to becoming a necessity through the last decade. The new decade will be characterized by large scale adoption of big data analytics by every industry vertical as well as a burgeoning popularity of big data analytics courses around the world.
Through the rearview mirror – Analytics since the 1600s
Unlike its nomenclature, the concept of data is relatively primitive. The roots of data analysis can be traced back to 1663 AD. While studying the bubonic plague, John Graunt documented his encounter with an overwhelming amount of information, arguably the most primordial documentation of sizable data. In the early 1800s, collection and analysis of data became a part of statistical sciences. And statisticians became the first generation of data analysts.
The story of the punch cards
In the 1880s, the Census Bureau of the United States of America estimated the data collection and analysis process required for the population census would take up to 8 years. Practically the usefulness of the venture would have been lost due to this massive delay. Additional estimates were made that the following census might take up to an unacceptable ten years. Thankfully, in 1881, Herman Hollerith invented the tabulating machine, effectively reducing the duration of 8 years to a mere three months. Herman, a young man working at the bureau designed his machine based on punch card technologies – cutting edge, according to the standards of that time. His machine used punch cards similar to those used in determining weaving patterns of mechanical looms.
The gradual development of big data analysis is accentuated by interesting stories and solved problems.
Cracking the Enigma
In the late 1930s and early 1940s, German enigma was the elephant in the room. Not only did this ingenious encryption system keep the Germans a step ahead of their rivals in martial activities. It also played a key role in the initial German successes during the polish and the French campaigns in the first half of the second great war. The German war machine was utterly unpredictable and pulled out some impossible stunts keeping the allies completely in the dark. A delay in cracking the code meant severely detrimental results for the world. The enigma triggered, in 1943, the world’s first data processing device named the Colossus. Followed by a hypothetical device by John Von Neumann in 1945, named the Electronic Discrete Variable Automatic Computer (EDVAC).
Neumann’s article discussed storage of programs for the first time in documented history, conceiving the idea of modern-day computers.
These very small but significant steps were fundamental in the development of today’s idea of big data.
Data driven paradigm shifts
53% of companies have adopted or invested in big data analysis infrastructure by 2020 up from 17% in 2015. The influence of large players like Facebook, Amazon, and Google along with the sheer demand of the market has pushed enterprise owners of all capacities towards a data centric environment. Let us look at some key areas where the effects of big data analytics are extremely prominent.
The internet is a major source of entertainment followed by television. Both of these mediums are utilized to reach out to the masses. Thanks to the massive incorporation of big data-driven strategies, modern-day entertainment is personalized, well targeted and consumer centric. Targeted marketing and entertainment had undoubtedly undergone a revolution in terms of precision and rate of success. But the benedictions range far outside of the sphere of commerce and entertainment.
Genetics introduced the notion of being unique in terms of traits and phenotypes. Though most of our genome shares a common set of sequences, we humans are unique because of polymorphisms and sometimes mutations in our genome. The difference in traits and phenotypes can manifest into a difference in susceptibility towards diseases and acceptance towards certain medicines. Susceptibility to tuberculosis can be considered as an instance. In 2018 almost 23% of the world’s population fell ill due to an outbreak of TB and many died. But a study has shown that people who have knowingly and unknowingly encountered TB bacteria, carry a biomarker for Tuberculosis, but not necessarily the symptoms. This uniqueness is a blessing but in case of drug development, consideration of this uniqueness is a bane. After humans became adept at sequencing the genome, it became possible to design precise drugs for populations carrying similar genotypes.
The only difficulty was, however, the size of the human population and the number of variations present. Today, due to impressive big data capabilities, this lone inconvenience is also within striking range. Drugs are designed keeping in mind the variations and polymorphisms seen in a geographic region, sometimes across ethnicity and gender as well.
Diagnosis of pathology
Thanks to big data utilization, our health diagnostic systems have become less erroneous. The histological studies and image-based results of health investigations can be read with more efficiency. Doctors can concentrate more on prescribing and caregiving than on pathological diagnostic decision making because of big data utilization. Machine learning based image and pattern recognition models are a lot easier to train and deploy due to the availability of big data resources. Detection of abnormal metabolism and irregularities in the physiology like the cases of hypothyroidism and broken bones can be undertaken without human assistance thanks to these data-driven models and tools.
More efficient public services
The incorporation of big data analytics has changed public services significantly. If we consider disaster management, it is possible now to even predict where and when the next calamity will strike. This naturally affords more time and allows better preparation. For example, radar data of the past can be used in order to reveal a pattern of tornado movements and damage data can be analysed for taking precautions. Last year almost all the storms that reached the US east coast were correctly predicted and entire counties and towns were evacuated in order to minimize casualties.
The obvious reference to the pandemic
A very enlightening example will be the ongoing pandemic. The COVID19 pandemic was a very unpleasant surprise, and new strains of the virus are predicted to surface within the next few years. Now based on available data a better plan for the next wave can be drawn and executed. This includes imposing more effective lockdowns and tighter personal protection norms.
Farming and nutrition
A better planned nutrition can enhance the abilities of fighting against any disease. Big data reveals what a population needs in a given situation. For example, if the soil lacks the necessary amount of iodine and iron, the population inhabiting the soil can suffer from hypo-thyroidism and even from lower RBC counts. Big data in the form of soil data can help mitigate this problem by including the deficient elements to the regional nutrition.
Farmers are gaining this same soil data, in addition to weather data for a better harvest plan. The crop selection process has never been this easy. Due to availability of analytics regarding soil diseases and weather, fertilizers and medicines needed for maintenance of the crop have also become simpler.
Big data is mostly used in the marketing and targeted advertising sectors. Rapid development in business analytics and data analytics has resulted in maximum incorporation in the case of businesses and tech industries as well. Unfortunately, the utility of big data in public services and health care is largely underrated and demands a spot-light as the pandemic continues to devour livelihoods. The larger picture of big data analytics has to be focused upon right from the beginning of big data training.
The skeptical question
A humanist skeptic might ask, “Who will end up losing their jobs because of this massive utilization of big data in training. Big data driven tools till date cannot be left alone without supervision. Neither of the decisions made by artificially intelligent tools can be trusted upon with a blindfold. If a self-supervised recruitment tool considers the previous trends and unwanted biases as instructional information, irreversible mistakes are inevitable. For example, if past data reveals a recruitment bias towards male candidates, the big data driven recruitment tool might stop considering women as potential candidates. Big data might have proven to be a superior means of eradicating human error, but at this stage of development it cannot be left without human assistance. Arguably, even after reaching its full potential, AI and machine learning tools must not replace human labour. As the human brain might continue to hold that bold little title of the most versatile machine ever known to us.