Who here remembers Minority Report? In the movie, government establishes a unit called Pre Crime. Pre-Crime’s job is to determine who is about to commit a crime and arrest the potential criminal before any harm is done. Sounds a lot like the new television series, “Person of Interest.” I’ve been meaning to write about that one for awhile, as the data management challenges are rampant. And then, New York said they were trying to actually do it. Wow.
Short run-down from the newspaper article on Slashdot:
The Domain Awareness System will draw data from 911 calls, previous crime reports, license-plate readers, law-enforcement databases, environmental sensors, and roughly 3,000 closed-circuit cameras. It will rely on the New York City Wireless Network (NYCWiN), a high-speed wireless broadband infrastructure that allows city agencies to rapidly transmit data, and used for everything from emergency response to reading meters.
I love this snippet, because it highlights the approach many are taking to the big data mentioned above.
Challenge: How do I analyze large volumes of data from many different sources? Sources that were not ever intended to work together and have been defined and maintained by organizations that did not intend to work together (utility meters and license plate readers, anyone?)?
Solution: Make it possible to move (and store) those large amounts of data.
Storage and movement are definitely required. No issues there. But New York is going to be most unimpressed with their return on investment if they ignore these key requirements:
- Comprehensive information modeling across multiple sources: Again, these disparate sources were never designed to come together in one great analytics whoosh. How does the Customer information map to the Tenant information (utility meters)? And how does that compare to the license plate owner (corporate information in the DMV) and the driver of the vehicle?
- Metadata definitions: Which metadata will the closed circuit cameras log? Which elements will be searchable? How will that metadata map to other systems metadata (license plate readers, for example)?
- Ownership: Ooooo….love this one. The owner of “citizen” cannot effect change in any of the subscribing systems. The utility company, for example, is not going to change the definition of customer/tenant to conform with previous crime reports. Which means that the data owner must instead understand all of the data elements, how they will exactly be transformed (and be checking the accuracy of that transformation), and how the elements from all of the different sources are aggregating. Big job. You should be catching a whiff of information governance here…
- Quality: First, let’s make the very risky assumption that each subscribing agency is cleaning their own data. (Let’s hope there is an SLA in place to assure that.) The written crime report could say 205 Avenue of the Americas, and the license plate could register to 205 Sixth Avenue. Both are correct. Should those roll up to the same citizen? (Yes, but you would not know that by looking at the record.) Are L. Eric Johnson, Lawrence E. Jonson, Larry Johnson, and Eric Johnson the same person? Can we rely on their address for verification? (Not with this wide swath of historical data and the frequency of relocating.)
What a massive coordination job! Let me throw one last fly in the soup: collection methods and human incentives. What is the incentive of the 911 operator? Determine the scope of the emergency and dispatch help. They are not rewarded for capturing the full legal name of a caller or victim. Neither are they rewarded for capturing exactly where the caller lives—rather, they need the scene of the crime.
Now. As your company attacks big predictive analytics projects, what is their solution? Without addressing the breadth of challenges listed above, your big data challenge will remain a challenge. You may end up with a report, but the report cannot spur real insight unless classic EIM best practices are followed.