Predictive Smackdown: Automated Algorithms vs The Data Scientist
Oxford Dictionaries defines smackdown as “A bitter contest or confrontation”. I didn’t realize that the word “smackdown” originated from the world of “entertainment wrestling” and isn’t even 30 years old. But this is the word that comes to mind whenever I talk to someone with a data science background about the topic of SAP Predictive Analytics’ automated machine learning algorithms.
Data scientists have a healthy amount of skepticism whenever we say we have a technology that can automate something that typically requires so much training, practice, and experience. Can you blame them?
Automatic Vs Manual Transmissions
I’m not a data scientist, and statistically speaking, there’s a pretty good chance you aren’t one either. There is an analogy that is quite appropriate here (if not completely boring) that most of us can associate with: automobile transmissions.
A manual transmission car relies on the human driver to engage the clutch properly and shift to the correct gear at the correct time. This co-ordination requires practice, and some are not comfortable with it even after days or weeks of training. There are others (like myself) that prefer a manual transmission even though it requires more work because it provides better feedback (ability to adapt), more control (flexibility), and in most cases better fuel economy (more efficient operation).
An automatic transmission uses a computer to measure various metrics (speed, RPM, throttle) and operates the clutch and gearbox on behalf of the driver. Some prefer this because it works automatically without their intervention, training, or experience.
Can you mess up with an automatic transmission? Yup, although it is much harder to stall the vehicle or “bunny hop” the car by being in the wrong gear. You can spot a person who knows how to drive a manual transmission by the extra special pain they feel when they hear someone “grind the gears” in a car.
Data Science – The Manual Transmission of Predictive
Ask a data scientist about predictive analysis and many times you will get either an extremely simple explanation (they are dumbing it down for you) or a highly theoretical one (they want to make sure you know it is complex). Now that I’ve ticked off all the data scientists, let me cover by saying “both answers are right” 😛 .
Predictive modelling is pretty complex – The “secret sauce” is not as much in which algorithm should be used, but the intelligence put into the modelling process by the data scientist him/herself. As humans, we have a semantic understanding of the data that can improve the predictive model and ultimately the predictive power of it.
For example, if we take all viewers in a movie theater, you likely would want to group people based on something like age or the type of relationship of the others in their party (i.e. parent, sibling, spouse), and then apply a different analysis for each group based on their common traits. You wouldn’t want to apply the same heuristics to siblings watching the same movie as you would for a couple on a date would you?
A good predictive model can take days, weeks, or even longer. The kicker is that in the end it is the effectiveness of the model, not how long it took to create it. A data scientist will do a lot of analysis and iterate through a number of predictive algorithms, models, and variables before settling on the final model.
The “data science profession” is probably one of the most subjective occupations in the world – how do you know if you have a great data scientist that is extremely creative or a lazy one that follows a very formulaic process that could be taught to anyone? Unfortunately it is not immediately obvious, but neither is a person who reached their destination by driving in second gear the whole time. You’ll eventually figure it out when the car gets to the destination but stinks of burning oil.
Automated Analytics – The Automatic Transmission of Predictive
Automatic transmission cars are very easy to drive because you simply need to understand the concepts of “Drive” and “Reverse” gears and away you go. The automated predictive algorithms in SAP Predictive Analytics are definitely more complicated than that, but aim to provide the same level of ease – you need to understand the concepts of clustering and time series but do not actually have to know how they work. This is what makes Automated Analytics so approachable to people without a data science background.
Data scientists typically scoff at these automated capabilities because they do not have the same level of visibility and control they are used to. We all tend to be a bit suspicious of “magic black boxes” because we usually don’t have any way of determining how effective they are. In a car, the RPM gauge and the sound of the car are the only indicators that gear shifting is working correctly. For the majority of drivers, this is enough to operate the car and get to their destination.
Automated Analytics generates reams and reams of analysis to help data scientists understand the performance of the algorithms on a specific dataset, much like an uber-set of gauges. However in keeping with the nature of “automatic”, there are some limitations on how much a data scientist can configure parameters. Just like an automatic transmission car, you either like it, hate it, or tolerate it because it gives what you want in the end.
Expert Analytics – The Semi-Automatic Transmission of Predictive
SAP Predictive Analytics also includes Expert Analytics which is designed for data scientists to take advantage of any predictive technology they wish – including our automated predictive algorithms, the open source predictive language R, the SAP Predictive Analytics Library (PAL), and the SAP Automated Predictive Library (APL). In Expert Analytics, the user is not tied to any one predictive technology or algorithm and in fact can create multiple algorithm chains in parallel and use the new Model Comparison feature in PA 2.2 to enable the system to advise which is the best predictive model to use.
A question I get a lot about Expert Analytics is how we position it against other predictive analysis tools from competitors. That is a topic out of scope for this post, but consider what the purpose of a semi-automatic transmission is – give the driver the control and fun of gear shifting when they want while eliminating the less desirable requirements of a manual transmission such as using the clutch at the right time. Expert Analytics is about getting you to your destination in the most efficient way, no matter whether you are letting the system do the shifting or if you want to step in and be more prescriptive about what happens when.
Smackdown Winner: You
I was with a customer last week who brought two data scientists and three data analysts to an all-day analytics workshop. These meetings are usually a challenge because we have data analysts who want to do more predictive analysis but we also have data scientists who tend to be perfectionists around process (since this is the best way to control the quality of analysis). Presenting Automated Analytics to the data analysts is always well received because it brings a new capability to them that does not require a PhD in Math or some intensely technical statistical training. This is usually the point where the data scientists say that what they do cannot be automated and lots of arms get folded.
However in this case the data scientists quickly understood the value of others in the organization doing their own analysis for some of (what they consider to be) the simpler tasks so they could focus on the higher value projects where complex modelling is required. One of them said the coolest (and in my opinion the most humble) thing I’ve heard a data scientist say:
“The business user knows more about the semantics of the data than I ever will. They can sometimes better understand how the data should be used because they are solving a specific business question. So while I can create complicated predictive models, they may not be as efficient as simpler models that have more business meaning in them”.
The strategy of including auto-nodes in Expert Analytics is to provide data scientists with yet another tool in their spectrum of technologies they can use. So, (some) data scientists will recognize the value of using automated algorithms alongside their traditional techniques. They likely also will want to encourage the data analysts to use Automated Analytics because they can better solve their own problems and free up the data scientist to focus on more hardcore predictive problems that require them to hand-craft their models.
Take SAP Predictive Analytics For a Test Drive
SAP Predictive Analytics includes both Automated Analytics and Expert Analytics in a single package so regardless of whether you are a business user or a data scientist, there’s something in there for you. You can download a free trial of SAP Predictive Analytics here: SAP Predictive Analytics Trial Download
For more information, ensure you are checking out the SAP BusinessObjects Predictive Analytics regularly.