The NSA’s Data Quality Problem
Even basic data quality problems can have big consequences. The National Security Agency learned this to its cost recently when the US applied for leaker Edward Snowden’s extradition from Hong Kong.
The extradition paperwork suffered from two classic data quality problems:
- Incorrect data — Snowden’s middle name was incorrectly noted as “James” instead of “Joseph”;
- Missing data — the passport number was not included.
These errors meant the order “did not fully comply with the requirements” of the Hong Kong authorities, giving them an excuse to turn down the request and avoid a unwanted diplomatic confrontation.
Snowden took the chance to slip away to the international transit lounge of a Moscow airport. And US diplomatic embarrassment didn’t end there, as more poor information resulted in Bolivian President Evo Morales’ plane being grounded because various European states had been assured that Snowden was on aboard.
The delicious irony of the NSA, whose massive databases on millions of Americans were revealed by Snowden, being unable to correctly spell the name of its own employee went largely uncommented.
Why? Probably because such errors are so commonplace that they are taken for granted. The NSA presumably has the right name it its systems, but this information was warped at some point as it was manually transferred onto the extradition document by a rushed official.
Equivalent situations happen every day in companies around the globe. Data quality problems are ubiquitous, and can have far-reaching consequences that aren’t always foreseen.
The moral of the story is that poor data quality is never a technical problem. It is always the result, directly or indirectly, of human error.
Thankfully, new tools are available that can help business people and IT organizations collaborate to fix data quality problems – using analytics to improve analytics.
[This post originally appeared on my Business Analytics Blog. Follow me on Twitter to hear more!]
As a quality consultant I see this all the time. I swear people think I'm lying to them when I tell them the computer does exactly what you tell them to do.
In essence there are no computer bugs. (ok.. maybe..but they are rare). What we normally see is poor attention to detail in defining the program requirements, (i.e. usually its missing certain combinations/permutations of data, not understanding the actual problem, or users doing something totally unanticipated).
A distance second to this is an actual true programming error. An incorrect loop, or failure to clear variables at the right time.
But.... the computers do exactly what we tell them to do. They don't think on their own and try to come up with devious and nefarious schemes to ruin our days, (and nights!). Even though I often think they do.
Usually its the human somewhere along the process that screws things up. Whether it be a transcription error, a design issue or a quality control issue (typo, data quality or testing failure).
I talk to non-data friends who worry about all of the data the government gathers. As a data person, I always point out that gathering the data is Step 1. You still have to make sense of it--and it has to be fit-for-use. Those things are HARD. Much harder than retrieving a large amount of data. They take time, consensus, and work. Found this quote, and love it:
All information looks like noise until you break the code. –Neal Stephenson
Great blog, Timo!