This article originally appeared on VentureBeat.com
The trend for Big Data isn’t showing any signs of slowing down. Here in Silicon Valley, we’re amid a second California gold rush, but this time, we’re turning nuggets of data, not gold, into dollars.
Big Data is not easy to define. It broadly refers to the idea that a company can mine its data and unlock potentially valuable insights. One profession in particular is poised to capitalize on the trend. This elite and emergent group, known as “data scientists,” claims to hold the keys to unlock the value in a treasure trove of data. As such, companies are pursuing data scientist, even though it’s never been clear what they actually do.
If these elusive data scientists are sitting on top of a hidden goldmine, it’s about time we bypassed the marketing jargon and set some clear ground-rules. What is a data scientist? Where can we find them? And why should we hire them in the first place?
What is a data scientist?
“I like to think of a data scientist as the interdisciplinary athlete of the modern organization,” Mike Driscoll, the founder of Big Data company, Metamarkets, told VentureBeat.
According to Driscoll, data scientists must be able to straddle both the business and technical side of an organization. They must be able to take a large data set, model it, and ultimately tell stories from data — usually the hardest piece.
Driscoll’s company, Metamarkets, functions like a digital data scientist. For a customer like the Financial Times, Driscoll said the software can answer questions from the data like: “Why are customers canceling their memberships?” and “How are users moving through the site?”
Is not just about sorting through a large store of data to uncover gems. “What makes a data scientist unique is his ability to use technology and hacker skills to solve actual real-world problems,” said Geoff Domoracki, founder of Dataweek. Big Data is being used by researchers today to find trends within terabytes of human genetic data, or in one famous case, to develop innovative metrics tracking for baseball players.
Knowing the right questions to ask sets a data scientist apart from the pack. They possess a more extensive skill-set than a typical database analyst (DBA) or developer.
“A DBA can’t build you a Google self driving car or develop an algorithm to automatically translate Spanish into English,” explained Jeremy Howard, president and chief data scientist at Kaggle.
Dr. Hadley Wickham cited a example of a business problem only a fellow data scientist could solve. A data scientist at Progressive Car Insurance noted that in the run-up to Halloween droves of people were searching for how to make a Flo costume. Flo is the pitchwoman in Progressive’s commercials. The company set up a page dedicated to dressing up as Flo, which led to a huge spike in traffic and an increase in sales.
Why hire a data scientist?
I met Dr. Wickham, data scientist in residence at Metamarkets, for a coffee to discuss the evolution of the data scientist. At Rice, he is responsible for readying the next generation of statisticians for this job. He revealed a major concern that the majority flock to tech companies like Facebook and LinkedIn, leaving less “sexy” companies with a lesser talent pool to draw from.
Wickham said there are core skills that every data scientist must have. They should be able to navigate at least one of programming language, Java Script, Python or R, and as he puts it, “to ask the questions you didn’t even know you had.” Data scientists should be skilled communicators and programmers and have a penchant for statistics.
Matt Pasienski, a data scientist at video analytics company Ooyala, likens his job to finding and removing blind spots within his organization. He recommends becoming proficient at processing data sets with Hadoop, Cassandra, STORM, MySQL, and a host of visualization tools. Pasienski said the hardest part is understanding the business and marketing side: How customers are using your product and how to tap new sources of revenue.
How to hire a data scientist
Dr. Wickham admitted that he has recently encountered a deluge of analysts, engineers, and statisticians proclaiming themselves to be data scientists. It’s no wonder they’re jumping on the bandwagon. A recent report by consulting firm, McKinsey & Company foreshadows a major talent shortage.
“We project a need for 1.5 million additional managers and analysts in the United States who can ask the right questions and consume the results of the analysis of Big Data effectively,” the report stipulates.
If an organization can’t afford to hire a data scientist, which is often the case for startups and small businesses, software like Tableau, Qlikview, and Metamarkets, offer a way to glean insights from data for less than the price of hiring an engineer.
As an alternative — or supplement — to the software, organizations can promote talent from within. Driscoll, himself a data scientist, said most of the engineers on his team have been trained on the job. “Data scientists today aren’t hired as much as they’re made,” he said. Note that business and marketing analysts, as well as developers and database analysts, have the potential to become data scientists.
Before hiring, Driscoll recommends drawing from the upper echelons of American universities, where there’s a glut of PhD students, and offering the requisite training. Big Data seminars, increasingly common at large companies, are often filled to the rafters.
“In this new field called data science, we need to create a language and lingo we can all use,” Driscoll said. “Simply put, we are the ones with the skills to tell stories about data.”