Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
nidhi_sawhney
Advisor
Advisor

Introduction


This blog post provides easy reference and sample code for some of the functionality typically required when dealing with Time Series data. For detailed background of  some of the real world problems in this area refer to part 1 of this blog post by rafael.pacheco

Anomaly Forecast of Sensor Data in Energy Intensive Industries – Part I: The Machine Learning and Be...

Overall Architecture


Here is a high-level overview of parts of the SAP Business Technology Platform that will be discussed for the anomaly prediction of sensor data


High Level Overview of Technology Components


 

 

The blog post will cover the typical ML Model Life cycle in the context of usecase:


 

ML Model Lifecycle



Dealing with Sensor Data


In today's automated industrial production scenarios there are tens or hundreds of sensor constantly generating new data which could provide vital information on both the current and predicted operational metrics. This data can also be analyzed historically to find root-cause analysis retrospectively. In this blog post we will focus on how to extract information from this data in order to predict anomalies in the near future, on the order of minutes so as corrective action can be taken and the production process can continue with minimal disruption while maintaining quality.

Challenges with Sensor Data


Sensor data from a manufacturing process typically has a few characteristics which need to be dealt with before that data is fit for use in a Machine Learning algorithm. The data needs to be cleaned for duplicates and checked for consistency, which we will not describe here but assume that the  available data has been persisted after such checks. The sensor comes in at uneven time intervals across the many sensors as these are not necessarily synchronized as the data is recorded when there is a change in value.

Here I will describe the functionality provided by SAP HANA natively for processing the data. I will use the functionality provided by SAP HANA SQL directly or via SAP HANA Python Client API

The functionality covered here works for both Cloud and On-prem releases for SAP HANA and SAP Data Intelligence.

Turning Sensor Data into Time Series Data


Here is a sample of data coming from 40+ different sensors which has been persisted in SAP HANA database table called HISTORIAN.


To analyze the different signals recorded at uneven time points the first step is to harmonize the data at a defined time interval. The choice of time interval is dependent on the data and underlying frequency it is being generated. SAP HANA Time Series provides easy functionality for data harmonization without having to deal with inherent nuances of time dimension.

For the above case lets say we decide to analyze the data equidistant at a 30 second interval, this can be done with the SERIES_ROUND function
SELECT SERIES_ROUND(DATETIME, 'INTERVAL 30 SECOND') AS TS ,* FROM HISTORIAN

This will create equi-spaced data:



Smoothing Sensor Data


The next step it to detect abnormal or anomalous events in the sensor data. However, in many cases it is not desirable to detect all abnormal values from the sensors but only consistently abnormal values to ensure the analysis is not impacted by spikes which could come from data quality issues.

In our example the target signal whose anomalous behaviour we want to analyze is the SIGNAL P_STEAM_SUPPLY  provided by TAG38 in the data above. The raw data looks like this:


Lets say we want to smooth the data for a rolling window of 5 minutes and calculate their rolling average
SELECT TS, WEIGHTED_AVG(AVG_PRESSURE) OVER (ORDER BY TS ROWS BETWEEN 10 PRECEDING AND CURRENT ROW) AS MOVING_AVG_5MINUTES FROM 
(SELECT SERIES_ROUND(DATETIME, 'INTERVAL 30 SECOND') AS TS, AVG(VALUE) AS AVG_PRESSURE
FROM HISTORIAN WHERE SIGNAL = 'P_STEAM_SUPPLY'
GROUP BY SERIES_ROUND(DATETIME, 'INTERVAL 30 SECOND'))




Pivoting Sensor Data


Now in order to create a predictive model which can predict whether or not the target signal drop below a certain value we would need to pivot the data so we can have the different signals as potential features for the ML Model. In order to do this we can use the pivoting functionality provided by python hana_ml package and detailed in this blog post Pivoting Data with SAP HANA
sql_cmd = 
'SELECT * SERIES_ROUND(DATETIME, 'INTERVAL 30 SECOND') AS TS,DATETIME,SIGNAL,TAGNAME,VALUE,UOM ORDER BY TAGNAME'
ts_data = hd.DataFrame(conn, sql_cmd)
print(ts_data.head(5).collect())

ts_pivot = ts_data.pivot_table(index='TS', columns='SIGNAL', values='AVG_VALUE')
ts_pivot.head(5).collect()

The pivoted data will be like below:


 

ML Model Development


Now that we have the data in the desired format, we can move the step of building a predictive model using native ML capabilities provided by SAP HANA.

The ML capabilities of HANA cover a wide range of algorithms which are available via SQL or Python APIs. For this example we will use the Auto-ML capabilities provided by Automated Predictive Library (APL) fuctionality and in particular the auto-classification model.

Lets say we have data over historic period and collected data prior to the event of pressure drop over a threshold. Together with this data we also collect process data providing information on operating condition metrics for example, number of batches, recipes and completion percentages across different consumption lines. The dataset has labeled whether or not there will be a pressure drop after 5,10,15,20 minutes as described in the results section of the part 1 of this blogpost

Step1: Split the dataset in training & validation dataset



Training Dataset


 

If the dataset is imbalanced which is likely the case with anomalies we can first upsample with SMOTE using the native HANA ML functionality for this
from hana_ml.algorithms.pal.preprocessing import SMOTE
smote = SMOTE()
new_df = smote.fit_transform(data=df, label = 'TARGET_LEAD_20',minority_class = 1)

 

This provides a balanced dataset which we can then split into training and hold-out sets
import hana_ml.algorithms.pal.partition as partition

train, test, valid = partition.train_test_val_split(data=df, training_percentage = 0.7, testing_percentage = 0, validation_percentage = 0.3, partition_method='stratified',stratified_column = 'TARGET_LEAD_20')

print("Training Set :", train.count())
print("Validation Set :", valid.count())
print("Fraction of data with in Training DataSet Target 1:", round(train.filter('TARGET_LEAD_20 = 1').count()/train.count()*100,2))
print("Fraction of data with in Validation DataSet Target 1:", round(valid.filter('TARGET_LEAD_20 = 1').count()/valid.count()*100,2))

We do not need to create a test set as the APL library function will do that on its own

Step 2: Train the model


import hana_ml.algorithms.apl as apl
from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier

model = apl.gradient_boosting_classification.GradientBoostingBinaryClassifier()
model.fit(train,features = features, label=TARGET_LEAD_20, key='TIMESTAMP_STR')

 

Step 3:Check Model Quality


model.get_performance_metrics()
##Get feature importance
model.get_feature_importances()['ExactSHAP']

##Check model on hold-out set
score = model.score(valid)
print("Model score:", score)

 

Step 4: Save the ML Model in HANA


import hana_ml.model_storage
from hana_ml.model_storage import ModelStorage
model_storage = ModelStorage(connection_context=conn)
model.name = 'MY_MODEL_NAME'
model_storage.save_model(model=model, if_exists='replace')

ML Model Deployment


In the current scenario typically we would want to keep updating training data as more anomalies in the data appear which would require frequent training and re-training steps. The automation of the above model training can be achieved via SAP Data Intelligence.

SAP Data Intelligence provides a few different options to create an ML Training Model. It comes with ready to use templates which can help create the training pipeline in matter of minutes.

In the ML Scenario Manager of SAP's Data Intelligence create a pipeline of type HANA ML Training


ML Template


 


 

This creates a pipeline which has configurable operators. In our case we configure it for the steps above, for example the table which has the training data, which fields to use as Target, how to split the data between training and test, which ML algorithm to use:


 

When the pipeline is executed it will save the model as an artifact which is later used for inference on new data. It also generates the accuracy score by default.


 

The template pipeline provides ready-to-use functionality to create a pipeline which can then be set to run on a scheduled basis.

ML Training Customizations


However, there could be cases where this template is not sufficient. For example, one needs to do some preprocessing or upsampling in our case before we do the training step. In this case you can extend the template with additional operators to process the additional steps. SAP Data Intelligence comes with many in-built operators. Incase these are not sufficient custom operators can be built as described in this Custom Operator blog post

Also for some algorithms like classification it is not sufficient to monitor the accuracy, additionally you would like to track the f1 scores. This can be done via configuring the operator to track additional metrics as described here SAP Data Intelligence Tips & Tricks blog post.

ML Model Consumption


Now that we have a pipeline which creates ML models, we can use the generated model to infer on new data, the whole reason we came down this path so we can predict in advance when the pressure will drop as we get new data.

To enable this again SAP Data Intelligence provides template pipelines which can take a previously generated model and apply on new data. For this we use the HANA ML Inference on Dataset or HANA ML Inference on API Requests, depending on how the new data is available.


 

This pipeline takes as input the model artifact id which is available from the training step above.


As with the training you then configure the SAP HANA Connection where the inference should take place. This could be different from where the training was done.

Incase the inference needs to be done independent of SAP HANA then its also possible to use the Java Script version of the ML model as described in the blog post by stojanm MLOps in practice: Applying and updating Machine Learning models in real-time and at scale

As the inference runs in the background, the monitoring of the process can be done via dashboards from SAP Analytics Cloud which provides live connectivity to SAP HANA and monitor the upcoming alerts in real-time


SAP Analytics Cloud Dashboard Monitoring the process


 

 
Conclusion

As I come to the end of my end of my blog post, as always I hope my blog posts are of some help when you find yourself dealing with similar problems, which are recurrent not only for ML under Beer Pressure but many ML scenarios under time pressure.
4 Comments