Have you read Gartner’s research note on Defining and Differentiating the Role of the Data Scientist? In fact, you can hardly read anything EIM without bumping into the term “data scientist”. And @DataLovers tweeted just today that data scientists are the new rock stars.
Here’s my issue. Before you can be a Data Scientist…conducting experiments with the data and unearthing new correlations between weather, what I had for lunch, and what I’ll need at the grocery store for dinner… Before any of that, you have to know what data you have.
And if you can trust it. And where it came from.
To do these things, you need to dig and dust, much like an archaeologist. In truth, you are uncovering the artifacts of how your company functions. However, you are also becoming a data anthropologist. (Look at you—two careers in less than a minute!). You are discovering how the data is used an interpreted within the culture of your company. Here is an interesting take on this from a 2010 CIO blog: http://blogs.cio.com/brian_hopkins/13695/the_anthropology_of_data .
But three years later, we are still embracing the volume and variety of big data. Our executives push us to think about the value of solving big data problems. Yet, somehow, the anthropology and archaeology are forgotten.
All of the new Big Data Sources that you covet—10 years of transactional data, LinkedIn data, tweets, weather data for the last decade—none of it has relevance or context without linking to the master data of your enterprise. LinkedIn data matters when it is about your customers, your prospects, or potential markets for your products. To make those linkages, you need to understand and trust your master data.
My advice? Talk to your Information Governance organization. Use exploration and discovery tools like SAP Information Steward. And leave some breadcrumbs for the next Data Scientist by documenting what you find. (And once you clean and relate the data properly with Data Services and Data Quality, establish procedures to make sure it is maintained and created in appropriate once in the future with SAP Master Data Governance .)