Can SAP HANA Crunch U.S. Unemployment Numbers?
When it was recently announced the U.S. unemployment rate fell to 7.8% after the economy added 114,000 workers in September and August, some very public figures cried foul and vented on Twitter. Mad Money’s Jim Cramer also took to Twitter but for a slightly different reason. Here’s his famous Tweet that suggests SAP should crunch the numbers in question:
“Just give the payroll calc job to $SAP or $TIBX and we can get them daily!”
Can SAP HANA accurately crunch the U.S. unemployment figures via national payroll data? Daily?! I checked in with a few of SAP’s HANA experts to find out.
“We can really only speculate, as nobody collects this data today,” said David Hull, Senior Manager, Technology & Innovation Platform Marketing at SAP Labs. “Hence the controversy over the recent numbers – they’re not based on hard data, and so they’re likely accurate within a certain margin of error.”
To clarify the process, Jim Cramer is suggesting companies send their payroll data to someone like SAP, in addition to the way companies are currently required to report to the IRS. “That way, not only is the jobs report based on hard data, it also can be reported daily instead of only quarterly,” said Hull.
To determine how much data this is, Hull said you would need to visualize and estimate the data set which might consist of an irreversible hash of a person’s social security number (to anonymize it, yet still report on a individualized basis), how many hours per week they worked, who they worked for (represented by a hash of the company’s employer ID or social security number), zip code, age, etc.
With around 200 million people in the database (including active workforce and educational institutions reporting data), Hull estimates around 1KB of data per person, per payroll period.
“Unless my math is wrong, that’s about 10TB of data per year,” said Hull. “Let’s say you want to size for three years of data, that’s 7.5TB compressed, which would require a 16-node cluster with 16TB of DRAM. We have partners that ship these today.”
Of course, getting all U.S. businesses to send SAP their payroll data would be a project that would take some time to complete. Is there a near-term pilot that would simulate what Jim Cramer is asking? Joe King, Chancellor, SAP HANA Academy, thinks there is via the following:
- Immediately load all appropriate historitcal labor statics data into a SAP HANA data mart on the http://ExperienceSAPHANA.com external facing site
- Build out a complete set of analytics on top of that data
- Search out paycheck issuing companies to have them send us their historical data
- Determine if those paycheck companies ‘ numbers are indicative of a change in the labor statics as reported today
“We can then meet Jack Welch’s challenge of knowing when the announced labor statics are not supported by other metrics,” said King. “We could then do real time analysis.”
I'm hoping you're just using this as an abstract example of data volume vs. SAP HANA abilities. Not sure the US employees would be very fond of sending their sensitive personal information to a private company. Also, no disrespect, but if we can't trust IRS, who says we can trust SAP? 😕
Jelena, yes, I think more than anything this is a "what if" scenario. Even if it became reality I don't think this would be a situation where SAP "owns" the data...more of how IRS and other agencies can benefit from SAP HANA. Also, I think it's pretty cool that Jim Kramer understands at a base level what SAP can do.