Easter Bunnies, Unicorns, and Data Scientists with Spare Time
Imagine what a “typical data scientist” looks like in your mind right now (or think about a real, live one that you already know).
(In reality, many types of people do data science work, so anywhere you see “data scientist”, you can also sub in “data analyst” or even “citizen data scientist” – the thesis of this article applies to these other personas as well).
Did you picture a nerdy person in a lab coat with quadratic equations floating over their head? Did you envision an uber-cool data genius wearing socks with sandals who clearly traded in their social skills for being able to write their own predictive algorithms by hand? Or did the image of an incredibly overworked person who is being bombarded with requests from all directions to crunch data, find some insights, and pull the proverbial rabbit out of their hat pop into your mind?
Regardless of the stereotype you conjured in your mind, a lot of them have a very interesting characteristic in common: most companies who have these data scientists don’t really know what to do with them.
But wait – aren’t data scientists those famed super genius secret weapons everyone is trying to hire to make sense of the ever increasing mountain of data being created every day? It sounds really glamorous, but you really should pity those poor data scientists – the expectation is that if you throw those mountains of data at them, they will somehow spit out magical pearls of wisdom to justify their high salaries and diva-like attitude towards data analysis.
You might as well ask them to look for the Easter Bunny in the data while you are at it. They have roughly the same chance of finding that pesky egg-hider. You think the Easter Bunny doesn’t exist? Well a data scientist that can find insights in random and arbitrarily large mountains of data doesn’t exist either.
Did you notice the words “business” and “problem” hasn’t even been used in this article yet? 😉
Focusing On the Business Problem Gives Scope and Context
The lack of a defined business problem is the leading cause of failed data science projects (and teams, and maybe some relationships). Data scientists are by nature, a results oriented people, and searching a haystack for a needle that might not even exist can be a frustrating and often unfruitful exercise. What we are effectively asking the data scientist to do is to define the business problem themselves and then solve it on their own. In most organizations, the data scientist is not the closest to the problems of the business – which is why many data science projects are called “science projects”.
This is where the “business user” or “business analyst” personas come in. They specify problems or business questions that enable a data scientist to select the right data, augment it, and create an analysis that provides that all important insight required to make better business decisions. This gives the data scientist more time to focus on actual problem solving, but does not actually create free time for them.
Unicorns are mythical beasts with a large, pointed, spiraling horn projecting from its forehead. Its horn was believed to purify poisoned water and even heal sickness. This creature excites imaginations and solves problems regular humans can’t fix on their own. Does this sound familiar yet?
You know what else about Unicorns? They don’t actually exist, no matter how much we’d like to believe that they do.
Now think back to the data scientist you imagined: you likely didn’t picture someone with their feet up on their desk, writing emails, and surfing Facebook all day. Just like Unicorns, a data scientist who has lots of time on their hands doesn’t exist either. (Incidentally, if you do have a data scientist that fits this profile, you need to fire them – they aren’t real data scientists!).
So is all hope lost? To find Unicorns, I think so. But there is hope to give data scientists more time to solve real problems. Most of you know where this is headed – predictive analytics using automated techniques can drastically reduce the time and effort to solve real business problems. The trick is to do it in a way that still satisfies the tests and validations a data scientist needs (and would do themselves if they were manually creating models).
SAP BusinessObjects Predictive Analytics does exactly this and more – and the new version 3.0 makes predictive workflows like creating segmented time series models even easier. Learn more about the new release here: The Future of Predictive Analytics is Here with… | SCN
So Why Isn’t Everyone Using Automated Predictive Analytics?
A customer I visited recently said:
“There is no silver bullet. There is no tool, algorithm, or solution that is perfect for every problem, every time – but automation can solve a large majority of business problemsfaster than doing them by hand. This would enable us to solve more problems in the same time without sacrificing accuracy or robustness”.
Definitely a smart customer.
But the problem is that automated predictive is a completely new way of solving age old problems. The value of faster model creation and automatic lifecycle management are things data scientists really want and need, but the IT and business buyers of a company usually have other criteria that sometimes makes no sense:
Customer: “How many algorithms do you have?”
Predictive Expert: “It’s not about the sheer number, it is about how we can create robust and accurate models for you.”
Customer: “Okay, so not many. How much does it cost?”
Predictive Expert: “Predictive is all giving you an ROI – so let’s see if we need to find a use case that would give you a high return”
Customer: “Uh huh. So expensive….”
I even had a customer once tell me, “Our data scientists like their existing tools and do not want anything to make their lives easier!” Really???
The truth is that many data scientists prefer their existing tools because they are already very productive with them and without a compelling reason to suffer another learning curve, it is time consuming and frustrating to adopt a new way of doing (what they incorrectly might think) is the same thing as before.
Ask any user that has switched from Windows to the Apple Mac – very few people switch “for the fun of it” and most have some kind (and often painful) learning curve before mastering the OS X interface and its applications. Now go back to them months later after they are proficient and very likely they will tell you the “new way” is simpler, faster, and more productive, but they had to “unlearn” how things were done in Windows in order to learn how to do things in a more intuitive way with OS X.
The New Normal
Almost every emerging technology starts with raw or rudimentary tools until there is enough adoption for the need to make them better and easier to use. We also know that at some point, “easier to use” evolves into “mostly automated”. For example, many IT tasks started out as manually coded scripts before deployment solutions were created, and now entire landscapes can be deployed with the single click of a button. We see coding giving way to automation in the Cloud world (with things like Cloud Foundry), the Big Data world (data wrangling on Hadoop and Spark), and we are of course seeing automated predictive algorithms take over from time consuming hand coding (“R” anyone?).
Knowing that the natural evolution of all technology results in automation, some of our smarter customers are focusing on the automated destination instead of the manual pit stop on the way. They are not only gaining all of the benefits of automated algorithms, faster model creation, and full predictive lifecycle management but are also avoiding much of the cost and effort of implementing manual predictive solutions and then later moving to an automated platform. There will always be a place for manual coding too, which is why you want as much automation as possible – so those highly paid data scientists are working on high value projects.
Speaking of time: A good data scientist will never have spare time no matter how much we automate the process for them because they should be solving more problems than they otherwise could (and remember to fire them if they don’t). So a good data scientists with spare time, just like the Easter Bunny and Unicorns, simply don’t exist – unless of course you have a perfect business where you enjoy infinite revenue at zero cost.
Now Santa Claus – I’m pretty sure he definitely exists. Just walk into any shopping mall during Christmas and you will see… 😉
I agree with your analogy about switching from Windows to Macs. I'd like to see SAP put more resources into lowering the learning curve for data professionals coming to PA from other tools.
Hi Grace, thanks for your feedback. Can you please kindly elaborate on what resources are lacking from your perspective and how you would like to be provided information to help with the learning curve?
Thanks & regards,
Like switching operating systems, starting fresh in SAP requires learning a new workflow with new tools. I'd like to see tutorials that followed something like the CRISP-DM process including data exploration and data cleaning. I've also been surprised by how little explanation/documentation exists within PA. There are many options available with little guidance to the user of what they mean or how they change the analysis. On each screen, it's usually not intuitive where to start or what steps to follow.
Hi Grace, thanks for the feedback.
There is certainly room for improvement to provide more usage-based documentation and tutorials, this is something that we acknowledge.
There are initiatives along these lines cooking these days in our engineering/knowledge management teams, although it is a bit early to share the details.
Until we get to this point (and even after that), I would suggest that you can post questions when you need specific help through our SCN community.
Our product management team is also eager to share their knowledge and provides dedicated how-to materials on a regular basis, this is also an area where the demand of our customers helps driving the future content.
Regarding the in-app documentation, the new Predictive Factory is introducing the concept of In-App help, where the user gets the help directly in the context of the application, and the help is coming on top of the existing user interface. This is an important direction for our future generation of products.
Thanks for the feedback - PA 3.0 is new and we are working on improving both the printed and online instructions to help a user progress. Some of these improvements will not show up until PA 3.1, but there are a few things we are trying to do before that release as well. Antoine's suggestion of the SCN Tutorials is a good one, but also do not hesitate to create a question in the forum if you get stuck - our PMs, Devs, and others are quite active in SCN.