Machine Learning Thursdays: Is Citizen Data Science Real?
We hear a lot these days about the “Citizen Data Scientist.” Everyone wants to use data science and machine learning to understand their business and automate tasks to improve efficiency. But we have a shortage of people with data science skills, so much so that salaries are high for properly qualified people. To chief data officers, it’s an attractive proposition to take people from within their business who understand data and have a strong mathematical background and convert them to data scientists through self-study and online courses.
We have a new generation of visual composition framework tools (such as SAP’s Predictive Analytics Expert Analytics) which enable a business user to visually compose pipelines of algorithms visually, using techniques such as R and Python selectively to solve more complex problems and getting impressive visualizations back to the user and helping them understand the business using statistics.
Challenges for Citizen Data Scientists
But there are challenges with this approach—it’s not simply a matter of choosing the best algorithm:
1)It’s very easy for a non-professional to misinterpret the results of a predictive model, making decisions based on poor results. It’s very difficult for a manager to recognize until too late.
2)There are large numbers of skillsets that they need to master to maximize the model accuracy:
- They need to understand feature engineering to extract useful insights from the data by deriving variables.
- The mechanisms needed vary across data types. Date/Time is very different from ordinal and continuous variables.
- They need to extract how these variables change over time.
- They also need to master complex techniques to make sure the data can be handled by the chosen algorithm and that missing values are correctly dealt with.
- They need to understand how to deploy the model into production.
3)They also need to deploy these models to production to generate the needed ROI. They need to understand how to keep the models up to date on an ongoing basis, and how to make sure they are accurate, not just on training data but also validation, test, and new data.
Automation Makes It Easier
With automation throughout the predictive lifecycle, it’s possible to avoid or simplify these challenges.
- You can train people to use automated predictive tooling to get a good model quickly and enforce best practice for model accuracy and robustness.
- You can give them clear guidance on how models perform and enable them to deploy successfully into a wide variety of environments.
- In parallel, they can hone their skills using a pipeline editor to experiment with other approaches while enforcing the same standards of model debriefing.
Most importantly, this reduces the risk of making a bad decision through an inadvertent but costly error. And the costs of entry to successfully utilizing and deploying predictive analytics is lowered, making it much easier to scale.
Don’t get me wrong, you still need training to be able to take advantage of this. You need to know how to ask the question and how to maximize the results.
Even Easier Insights with SAP Analytics Cloud
Finally, techniques like Smart Data Discovery in SAP Analytics Cloud enable business users to use advanced analytics for business exploration without needing to use any algorithms directly. This can be deployed to normal business users. The interface is set up to give them a simple way to frame the question. The insights are displayed in ways that help the user understand what they can and can’t infer from the data.
Is Citizen Data Science Real?
So, to answer the original question: Yes, Citizen Data Science is real, but we should think about what is the best way to enable people of different skillsets to successfully use data science in their business. This trend will only multiply as the automation techniques and helper tools advance and continue to lower the entry bar for data science and predictive analytics.
To learn more about this subject, see:
- All the Thursday series posts for more on machine learning, predictive analytics, and artificial intelligence.
- The predictive Forrester Wave report and the predictive analytics TDWI paper, Machine Learning for Business: Eight Best Practices for Getting Started
- The IDC paper, The Value of Analytics in Digital Transformation