A Look at the Daily Life of a Data Scientist

former_member45511 · ‎03-15-2018

As a data scientist, I can definitely say we live in promising times. The fact that predictive analytics applications are worthwhile and their use brings measurable added value, doesn’t need to be discussed these days. Everyone talks about machine learning, predictive use cases, data science, data mining, advanced analytics—just to mention a few buzzwords.

Experts like data scientists are in increasing demand as companies’ new requirement for software to be intelligent and support machine learning combines with their motivation to invest into such use cases.

But there are a few pain points that we data scientists face in our daily lives as we strive to not only implement use cases, but also to transform them into business value. Let’s take a look, and I’ll share some suggestions.

Pain Point One: Data Scientists and the Business

The first problem is that we data scientists sometimes get distracted by talking about the cool mathematical stuff we can do with the data. For some of us, the most fun is to take a long swim in the data lake and see where the data drives you, or to implement algorithms and to find the best model that fits your use case.

But we need to understand—and always remember—the business’ needs, which means we must work with the advisor of the department. We bring the knowledge of algorithms, statistics, and data preparation to the table, while they bring the understanding of the collected functional data, processes, and business needs.

Pain Point Two: Data Scientists and IT

Another important stakeholder during a data science project is IT, since they have to take care of the models and algorithms we build during the project or proof of concept (PoC) when we want to deploy these and put them into production. Usually there is no single tool that is best for all data science projects, since requirements and projects are way too diverse. One important pillar of data science is of course the open source world like Python and R. (I have to admit I am a big fan of R, I can’t even do the housework schedule for me and my husband without it).

During a data science project or a PoC, the main focus is to show that we can find patterns in the data and we can find the perfect model (or at least a very good one with our favorite tool for this use case). The problem? Sometimes, we feel so comfy in our math and algorithm world that we tend to forget the IT-landscape of the company.

So, what can happen at the end of a PoC is that we generate awesome results that make all stakeholders happy, but we can't productize the model because it doesn’t fit into the existing IT-architecture. If the model is set productive, we have to make sure that it is always up to date and fits the actual data and the business.

Usually model accuracy decreases over time, so it is important to monitor performance changes and retrain the model from time to time. (Some companies employ data scientists only for this purpose and I feel sorry for them, since this is one of the least exciting jobs for a data scientist.)

Pain Point Three: Data Scientists and Increasing Demands

Once business users or managers see the benefit of machine learning and predictive use cases, and we’ve proven their feasibility in a successful project, they come up with more and more ideas. This means, even if companies had a sufficient number of data scientists for their current needs (which isn’t usually the case), their demand for data scientists is growing over time.

This leads to the problem of how to cover the demand for prediction models and ensure that they are always up-to-date.

One solution? Instead of hiring more and more data scientists, companies can implement automated applications that create and maintain recurring, everyday forecasts.

Even Gartner says that more than 40 percent of data science tasks will be automated by 2020.

https://www.youtube.com/watch?v=D7pt2ijmhvE

Saving Resources by Automating with SAP Predictive Analytics

SAP Predictive Analytics has an automated mechanism that allows users to quickly generate forecasting models as well as maintain and update those models.

Once an analytic dataset is created by a data scientist, a trained analyst or departmental employee is able to easily create a forecasting model. These models can be classification, regression, time series, cluster, recommendation, or link analysis.

This automation can come in handy when the demand or expectation for data science teams is growing. Not all problems require a tailor-made algorithm. Benefits:

For the easy (and as some data scientists might say, “boring”) problems, automation leads to quicker results at lower costs.

In addition, automation frees time for the interesting projects or mathematical questions that need more attention than a plain time series forecast or a churn analysis.

Furthermore, it enables business analysts to do these forecasts on their own. This not only gives the data scientists more free time but it also helps to close the resource gap and brings departments with domain knowledge and departments with data science experience closer together.

Pain Point Four: Creating Easily Consumable Results

Another problem that we data scientists experience usually appears at the end of a project—it’s the visualization of results. It is quite surprising, but most end users find that R-Code or an R-Visualisation on its own is not a sufficient way to consume the predicted results every day.

The quickest way to the brain is visualization, which means that data scientists usually are expected to provide a nice dashboard or interface at the end of such a project. For me, that is not the most interesting part, it’s just something that has to be done. Solutions?

Visualization Tools

Here again a standardized tool can help. With the combination of predictive analytics and SAP Analytics Cloud, we can write the results into the backend system like SAP HANA or SAP BW/4HANA and can consume them directly in the visualization tool for the business analyst.

They can use SAP Analytics Cloud’s self-service tool to create their own dashboard or to enrich existing dashboards with predictive results, so that end users can lean back in their office chairs and consume the results in a nice and user-friendly dashboard.

Direct Integration

Another option that helps the end user consume the predicted result is through direct integration into the solutions.

Here, the tool SAP Predictive Analytics integrator can help. This nice integrator gives you the option to publish a model that has been created with SAP Predictive Analytics to a solution like SAP S4/HANA. In this scenario, the data scientist or business analyst doesn’t have to care about the visualization anymore.

They can just can just build the model, publish it to SAP S4/HANA and let the user take care of the rest. Did that get you interested?

Pain Point Five: Beautiful Models Nobody Sees

The very last problem is that sometimes you build advanced models…and nobody uses them.

With these suggestions, the chances will be higher that they will find their way into the business processes and become available for the end users.

Learn More

Read the white paper, Machine Learning Automation: Beyond Algorithms

Watch our video, SAP Predictive Analytics: Value of Automation Whiteboard

Visit our product page.