Our series so far has explored Public Sector applications of predictive analytics and machine learning to enable data-driven policy and practice. One could argue that characterization of these technologies as emerging belies the fact that governments have been using non-linear computational models since the 1950s . Furthermore, the statistical modelling techniques on which predictive analytics and machine learning are based have been understood since the early 19th Century. Why then are we only now seeing these techniques being applied by leading Public Sector agencies like the State of Indiana and Queensland’s Office of State Revenue (OSR)? The answer lies not in the maturity of the computational models, but in the preparedness of the big data platforms and the ability to interrogate massive data sets in real-time.
Data-rich but information-poor
In its September 2017 report  to the President of the United States, the Commission on Evidence-based Policymaking states: “…the American people want a government that solves problems. This requires that decision-makers have good information to guide their choices about how current programs and policies are working and how they can be improved”. This is precisely the motivation for data-driven government. But the Commission goes on to observe: “…while collecting taxes, determining eligibility for government benefits, engaging in economic development and running programs, government necessarily collects a considerable amount of information. In 2017, the American public will spend nearly 12 billion hours responding to more than 100 billion individual requests for information from the Federal government. Even though the direct costs of collecting these data are funded by taxpayers, these data are not generally available for producing evidence”. And this is exactly the challenge that needs to be overcome…
The United States is certainly not alone in its desire for evidence-based policymaking, or in the challenges it faces in realizing this vision. All modern governments have rich stores of customer and case data, but most government agencies struggle to convert this data into meaningful information and actionable insights. The reasons for this include:
- Government data holdings are often siloed within and across agencies, and can be difficult to access – let alone share;
- Data quality is often found to be inconsistent across the silos, hindering efforts to integrate systems and consolidate data assets;
- The sheer amount of data – sometimes referred to as the fog of big data – can make it difficult to identify pivotal events and emerging trends;
- Analytical processing can impact the performance of operational systems, while the alternative data warehousing approach typically introduces reporting lag; and
- Regulatory constraints and cultural resistance further impedes agencies trying to unlock the information held in government data stores.
These problems have been decades in the making, and are therefore not easy or quick to solve. But with the advent of real-time computing, Public Sector agencies now have a viable platform for working with big data at the point of service. This capability is key to overcoming the abovementioned challenges, and thereby enabling data-driven policy and practice.
Overcoming data access challenges
With their Management and Performance Hub (MPH) up and running on a real-time computing platform, the State of Indiana is today an exemplar of open data. But that wasn’t always the case – many agencies were understandably nervous about providing access to their customer data and operational systems. They wanted assurance that their data would be maintained securely and used appropriately. The MPH team addressed these concerns by establishing Memorandums of Understanding (MOUs) that brought the agencies into the process. This was made possible through the framework created by an Executive Order issued by Governor Mike Pence . The Executive Order served a similar function to the EU’s Data Protection Directive, in that it outlined the requirements for securely accessing and sharing agency data .
While the MPH team opted for a centralized data governance model, another viable approach is to leverage near-real-time analytical technologies across distributed data platforms. In either case, this problem can only be partially addressed with technology, and in some cases government regulations prevent sharing of data between (and even within) agencies. But the MPH experience demonstrates that it is possible to overcome data access challenges through a combination of real-time computing, cross-agency collaboration and Executive-level sponsorship.
Addressing data quality issues
Terms like unreliable, incomplete, duplicated and obsolete are often used to describe government data assets, and it’s not uncommon for data quality issues to be cited as a significant inhibitor to business analytics and systems modernization initiatives. In Australia, this challenge is magnified by the absence of a whole-of-government identifier, which hampers matching of citizen records across data sets. One might therefore assume that Queensland OSR must’ve spent months cleansing their data in preparation for their machine learning prototype. Their experience however was that predictive algorithms can be applied to imperfect data with decent results. Elizabeth Goli, OSR’s Commissioner, explains: “…despite the use of only three internal data sources and the current challenges we have with data quality, the machine learning solution was still able to predict with 71% accuracy the taxpayers that would end up defaulting on their tax payment. What this tells us is that you don’t need to wait for your data to be 100% perfect to apply machine learning”.
Although data cleansing will undoubtedly improve the accuracy of predictions, Ms. Goli observes: “…the tool itself will actually become a key enabler in improving the quality of data”. This is due to the machine’s ability to interrogate massive data sets to establish probable linkages, and its ability to autonomously improve the accuracy of its predictions over time. So, while 71% is a good start, OSR expects to improve prediction accuracy to over 90%, through the combination of increasing data quality and refinement of the predictive model.
Seeing through the fog of big data
During his tenure as Chief Financial Officer for the State of Indiana, Chris Atkins observed that “…very few governments view data as a strategic asset. It’s usually not managed nearly so well as the government’s money. But it’s just as important for complex problem solving”. Perhaps part of the reason is that government data – unlike public funding – is abundant (for example, just the infant mortality use case required analysis of 9 billion rows of data). Such vast amounts of data can make it difficult to derive information and insights, simply due to the impracticality of traditional disk I/O at such a scale. This is where in-memory data platforms come to the fore, enabling massive data sets to be interrogated within a timeframe that is acceptable for business purposes and workable for predictive analytics scenarios.
Another benefit of real-time computing is the ability to apply analytics directly to operational systems, enabling users to work with the most up-to-date version of data and to refine their data models dynamically. Mr. Atkins articulates the value of this capability to the business: “…real-time data access lets you know with a high degree of certainty that your view of the issues is current, and that the decisions you’re making with regard to policy and planning will be best calibrated to address the problems. Without real-time data, you’re managing the problems of yesterday – not today or tomorrow”.
Tackling performance concerns
For nearly half a century, the status quo has been that operational data is extracted, transformed and loaded into data warehouses, to which analytical tools are applied and business reports are generated. ETL processes are typically run in batch overnight (often not even every night), resulting in business decisions being made based on yesterday’s data (in a best-case scenario). The fundamental reasons for this are that transactional databases are not designed for reporting, and system performance can be impacted by analytical processes. Mr. Atkins describes how this issue manifested during initiation of the MPH project: “…the agencies’ first concern was that access to data could not interfere with their operations. After all, we didn’t want to shutdown citizen services!”
But real-time computing is challenging the status quo by enabling analytical processes to be applied to transactional databases, without impacting the performance of operational systems. Ms. Goli describes the potential of this capability to transform government service delivery: “…machine learning provided the ability to crunch large amounts of data and achieve real-time insight on that data. Visualization through the journey map and risk ratings brought these insights to the forefront, allowing frontline staff to easily consume them and embed them in their day-to-day business processes”.
Handling cultural resistance
IDC predicts that by 2019, 15% of government transactions (such as tax collection, welfare disbursement, and immigration control) will have embedded analytics . But still there is cultural resistance to new ways of working with machines. This is largely due to the perception, born out of the Industrial Revolution, that machines will replace peoples’ jobs. However, the McKinsey Global Institute argues that while 36% of healthcare and social assistance jobs will be subject to some degree of automation, less than 5% can be fully automated. In most cases automation will take over specific tasks, rather than replacing entire jobs, with about 60% of all occupations having at least 30% of constituent activities that could be automated .
Ms. Goli explains that in OSR’s experience, automation has the potential to enhance the working experience: “…with the introduction of advances in technology, such as machine learning, people are naturally scared that the machines will ultimately replace their jobs. However, what our prototype showed our staff was that this technology enriches, rather than replaces their jobs. Specifically, our staff can see how machine learning will take a lot of the frustration out of their jobs by enabling them to deal with customers holistically and help them to improve the customer experience”.
This series has examined contemporary applications of big data analytics within the context of Public Sector, and explored the opportunity for emerging technologies to extend and enhance current analytical techniques to deliver better social and economic outcomes. The Melbourne Institute’s study into Intergenerational Disadvantage demonstrated that governments already have rich data assets that can be leveraged to provide valuable insights for policymakers. And case studies from the State of Indiana and Queensland’s Office of State Revenue illustrated the potential of predictive analytics and machine learning to transform government service delivery.
The experience of these early adopters suggests that while the computational models might be sufficiently mature to support predictive analytics and machine learning techniques, the challenges lie in preparing the underlying big data platforms and overcoming regulatory constraints. This article has explored the extent to which real-time computing techniques can be leveraged to alleviate some of the problems associated with data access, data quality, data fog, performance and cultural resistance. Commentary from Mr. Atkins and Ms. Goli indicates that neither Indiana nor OSR had a fully-formed strategy, integrated systems or well-prepared data at the outset. They started by establishing real-time platforms that enabled them to develop data-driven capability and demonstrate the value of evidence-based decision-making. It appears that their journeys of exploration have offered as much insight to their respective businesses as have the technologies themselves.
The potential for emerging technologies to enable data-driven policy and practice is well articulated by the Commission on Evidence-based Policymaking: “…the Commission envisions a future in which rigorous evidence is created efficiently, as a routine part of government operations, and used to construct effective public policy. Advances in technology and statistical methodology, coupled with a modern legal framework and a commitment to transparency, make it possible to do this while simultaneously providing stronger protections for the privacy and confidentiality of the people, businesses, and organizations from which the government collects information. Addressing barriers to the use of already collected data is a path to unlocking important insights for addressing society’s greatest challenges”.
 Giorgi & Kjeldsen (https://www.amazon.com/Traces-Emergence-Nonlinear-Programming-Giorgio/dp/3034804385).
 Commission on Evidence-based Policymaking (https://www.cep.gov/content/dam/cep/report/cep-final-report.pdf).
 Executive Order 14-06 (https://www.in.gov/governorhistory/mikepence/files/Executive_Order_14-06.pdf).
 The European Parliament and the Council of the European Union (http://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:31995L0046).
 IDC (https://www.sap.com/documents/2017/05/083593b6-ba7c-0010-82c7-eda71af511fa.html).
 McKinsey & Company (https://www.mckinsey.com/~/media/McKinsey/Global%20Themes/Digital%20Disruption/Harnessing%20automation%20for%20a%20future%20that%20works/MGI-A-future-that-works_Full-report.ashx).