In this installment of the series we will discuss the applicability of Big Data in the Government Sector, the key challenges, the attributes of the data sets, Big Data applications in the Public Services space and a set of imperatives required to be executed by the government executive for India to leapfrog on its development agenda using technology enablers.
In this blog post we will examine the role of big data in the government sector. We will specifically examine the how Big Data differs between the public and private sector and how the challenges differ across the sectors. But before that let’s start by defining the term Big Data.
Defining Big Data
Big Data, is a general term for the massive amounts of data being collected from all types of data sources. A common characteristic of this data is that it is too large, raw and unstructured for analysis or data management through conventional data management techniques. The key operative words in the definition above are the fact that big data cannot be handled or analyzed using conventional techniques. So Big Data is not always Petabyte scale even datasets of a couple of GB can constitute big data if it fulfills our definition of complexity of analysis and data management.
Big Data is often associated with three V of Volume, variety and velocity. We will be talking of these attribute in the following sections.
Big Data in Business Vs Government
The primary mission of Big Data projects in Business Vs Government are not in conflict with each other, yet they reflect different goals and values. Let’s examine some of these in this section:
In the field of business the objective of the projects are aligned to the goals of the business/corporation i.e. to improve profitability, create and deliver differentiated goods and services and to sustain a competitive edge that the company or the organization enjoys.
In the Government sector on the other hand the goals are to maintain peace and tranquility, achieving sustainable development and delivering general welfare and development.
Business use of Big data also differs on account of the limited amounts of stakeholders and the shorter time horizon over which they expect results.
Government projects typically involve significantly higher number of stakeholders and typically operate over a longer time horizon given the strategic nature of these projects.
Business use of big data is limited in scope and is often restricted by the limitations imposed by citizen privacy laws and other such statutes. Most of the business oriented projects typically are internally focused unless they are related to customer relationship management domain.
Government projects are more often governed by different set of legal statutes depending upon the goals of the project. For example while big data projects in the utilities space may be limited in its scope of customer listening the Homeland security department may have sweeping powers to collect, store and analyze citizens data.
Data Set attributes compared
Big Data differentiates itself from normal data on account of three V’s which is the short form for Volume, variety and velocity.
- Volume is the primary attribute of Big Data. Typical Big Data projects are typically at the minimum Terabyte scale but in most cases can touch petabyte scale.
- Velocity is the second attribute of Big Data and deals with the speed with which the data arrives or gets produced vis-à-vis the speed with which it is processed & analyzed. The speed element will become more and more important as the Internet of things applications start getting deployed in the country to monitor and manage the infrastructure and other assets of India.
- The third attribute of big data is variety. Variety is about the different forms of big data that ware produced or consumed within a single application. For example an epidemiological/disease surveillance application could end up using structured and unstructured data on hospital admissions, Lab information management systems, weather data etc. to predict the possibility of a infectious disease breakout in a certain part of the country.
In addition to the three V’s of Big Data there are additional factors that become relevant in the context of Government Big Data Projects. These factors have been described below:
- Data Silo’s/ fragmentation is the first attribute we see in the government sector. For example implementation of e-Governance platforms of different vintages at Federal (Central), State and Local Government levels leads to data silos. The problem with data silos gets further exaggerated by the existence of departments at each of the levels. Moreover, data silos come with their respective stakeholders and data formats. Getting all of the stakeholders working together is therefore the first problem and getting the data into a standard data interchange format is the second level of the problem which is specific to the government sector.
- The second differentiator of Big Data Projects is primarily around the collection, storage and analysis of citizen data. Some of the questions that are likely to emerge in the days to come are going to be around:
- Personal data confidentiality
- Use of data provided to one government agency by another or cross usage of data
- Data Jurisdiction at Federal, State and Local level
Proper utilization of data for reaching meaningful inferences will require legal frameworks for cross usage of data with adequate personal safeguards.
To summarize Big Data projects in the government vertical share the three V’s with Business sector. Additionally, Government projects also need to deal with the phenomena of Data Silos and stakeholders and Legal Statutes governing citizen related data. For Big Data programs to become successful the executive needs to take steps to break data silos, have control towers to ensure proper data management and a legislative framework to address privacy concerns.
Big data applications
Big data applications in the government sector can be broadly classified into the following types:
- Scientific payloads: These include meteorological, geospatial, genomics and propionic datasets. The main focus of these applications is to further scientific understanding of the resources available to the nation. These applications also for the understanding of science.
- Horizon scanning: Horizon scanning big data application is a euphemism for surveillance applications that are operated by internal and external security apparatus of the government. Most Verizon scanning applications and programs are not in public domain. They involve a mining datasets from the web, wiretaps, call detail records as well as video surveillance.
- Public service applications: a good example of a public service application which takes form of public data platform is the Aadhaar program which provides all the residents of India with a unique identity to a government/public services.
- Data streaming/real-time applications: real-time applications constitute the fourth category of big data applications in the government space. This category of applications will start gaining prominence with the rollout of the hundreds smart city initiatives across the country.
Structural, technical and legal imperatives
For India to fully exploit the benefits of big data the country will necessarily have to implement and role of the series of imperatives. These imperatives can be broadly divided into three major categories.
- Structural imperatives: Firstly, for India to leapfrog on the development path we will have to create a clearinghouse of data at the national level. This clearinghouse will define common data management guidelines and also operate as a national level interchange for quick and fast transfer of the relevant data from one data silo to the other. Secondly, the government will also have to create fiscal incentives for Indian system integrators to devote capacities for creating operating and managing these big data applications. Thirdly, we in India need to create a manufacturing base for hardware, software and sensors to make the exploitation of the new age 80 applications scalable and cost-effective. Dependence on global manufacturers could lead to higher costs and does lower level of adoption and focus on the programs.
- Technical imperatives: the technical imperatives for a successful government big data programs will involve the following measures. First and foremost we need to build people capacity and big data and e-Governance programs. Secondly, we need to have guidelines around selection deployment and development of technology frameworks and architecture to support these programs over the medium to long term horizon. Thirdly, the government needs to collaborate with public and private sector stakeholders for the development of exa-scale technologies.
- Legal imperatives: the government needs to put in place suitable statutes and programs to address concerns around data privacy and how citizens’ data will be handled in the context of these big data applications. The design and implementation of many programs requires data from different silos to be brought together. The existence of a formal policy/statute will allow for a secure exchange of the data and thus enable more efficient program management and delivery.
As big data application and technology become a mainstay in government and business we will see widespread adoption and deployment of these frameworks.
Big data applications have different objectives in the government vis-à-vis business and are therefore also have defended a set characteristics and challenges for big data applications to become successful and viable the government will have to introduce a set of changes which are structural (executive), technical and legal in nature.
As demonstrated by the Aadhaar card initiative and the IMD working with the NRSC during the Orissa cyclone a couple of years ago, these applications have both long-term and short-term impacts that far outweigh the cost of implementation operation and management.
For India to leapfrog on its development goals we will have to adopt an exploit big data programs in a really big way. This will help in improving policy decisions, program delivery, and program management.
In next blog post I will discuss the use of ICT to meet the health care objectives that India has set itself to achieve by the year 2030.