Anomaly Forecast of Sensor Data in Energy Intensive Industries – Part I: The Machine Learning and Beer Production
Motivation and Background
Beer production is an important example of an energy intensive industry involving heating or cooling whose source of energy could be either electric or thermal depending on the application. Refrigeration takes the largest electric energy consumption whereas thermal energy from burning natural gas is used to generate hot water and steam, which is then used in brewing, packaging and possibly building heating. We focus on the analysis of the brewing room (brewhouse) where steam vapor is supplied by boilers and delivered to the mash tun, lauter tun and kettle in the process line shown in figure 1.
According to , the United States beverage industry consumed about 70 trillion BTU of energy in 2010 alone. This corresponds to an expenditure in energy between three and eight percent of the production cost of beer valued in the hundreds of millions of dollars annually . Improving energy use is an important way to reduce cost. However, there is also an associated cost of production of greenhouse gases (GHG) that has to be considered, particularly when unnecessary burning of gas exacerbates global warming.
Heating of the mash in the mash tun, wort in the lauter tun and boiling of wort with hops in the kettle has to be done in a controlled manner. It is desirable to minimize the heating time in order to increase productivity, thus requiring a high demand of steam flow. For this reason, breweries have specialized equipment and sensors to monitor physical variables such as temperature, volumetric flow rates, pressures, levels etc., in the tuns, kettle and boilers used to supply the steam vapor.
The steam vapor demand is a function of the amount and the type of material being heated, the type of recipe being prepared, the speed at which each line is run and the interaction among the lines. When the steam vapor demand is high there is a pressure drop in the tuns and kettle, causing a slowdown in the heating process, i.e., loss in production capacity and unnecessary boiler ignitions aiming to recover the lost pressure in the circuit.
Early detection of anomalous behavior of sensor data allows for convenient scheduling of corrective actions aimed at preventing loss of production when the pressure drops below certain threshold for a duration of time. The actions originating from the early detection of such anomalies guarantees cost savings since corrective tasks are performed only when warranted to prevent unwanted downtime and/or loss of production due to the low quality of the product. This in turn is translated into additional overtime to recover the amount of beer originally scheduled to be delivered.
With the vision to reduce loss in production capacity, generation of greenhouse gases, raw material, labor cost, infrastructure, and contribution to global warming caused by excessive steam vapor demand we employ the algorithms from the SAP HANA Predictive Analytics Library (PAL) and HANA-ML with python jupyter-notebooks to identify and forecast these anomalies using sensor data.
Part I (this blog post) states the problem and describes the approach to solving the problem. Part II of this series by Nidhi Sawhney shows step by step the deployment of the solution using a spectrum of integration and deployment capabilities of the of SAP Business Technology Platform, including SAP HANA and SAP Data Intelligence (DI). Detailed solution and the implementation of the algorithm is provided as well. Part III (coming soon) concludes this analysis with a detailed description of the solution using SAP Data Warehouse (DWC).
One of the most severe problems occurring in the brewhouse is a pressure drop lasting enough time to prolong the residence of the wort in the kettle for a particular recipe. If the temperature in the kettle is not maintained at a certain constant temperature the quality of the final product would render it useless. To avoid this situation a forecast that allows for a backup boiler to be turned on to supply the necessary steam vapor in the kettle is desirable. This in turn would allow brewing companies to benefit from the use of Artificial Intelligence (AI) to significantly improve the business process and end-product quality.
- a backup boiler is able to reach full capacity 5 minutes after it is turned on
- the optimal pressure value for normal steam vapor supply is 105 psi
- any pressure drop below 87 psi lasting more than 5 minutes is undesirable
we can train an algorithm, using the information embedded in the persisted sensor data, aimed at forecasting undesirable pressure drop events.
Sensor data is usually recorded by programmable logic controller (PLC) and stored in historians  at uneven time intervals. Sensor signals for pressure and mass flow rate associated with steam vapor supplied from boilers to the tuns and the kettle are shown in figure 2. Supply pressure (psi) is shown on top and demand mass flow rate (lb/h) at the bottom. The red dashed line illustrates a pressure of 87 psi and the black dash-dot line represents the value corresponding to the normal pressure under normal operating conditions of value 105.7 psi. Notice the increase of mass flow rate associated to the decrease of pressure around 17:00h (red solid line of the figure). The area below the red dashed line is considered as abnormal condition since the pressure remained below 87 psi for more than 5 minutes.
The objective is to train a classification model able to forecast a pressure drop lasting at least five minutes before the event occurs using the available sensor data. To this end, we categorize the data into classes or categories. In particular, the target would be properly labelling the pressure value of the steam vapor supplied to the brewhouse as “normal” or “abnormal”, depending on the operating condition of the system. We refer to an ‘event’ as a situation when the pressure drops below a specified threshold for a sustained period of time as stated above. Figure 2 above shows one ‘event’ and within this event, many different values of the target pressure. The predictive indicators of our model would be comprised of the available flow rates, pressures, temperatures, levels recorded by the sensors, etc., of tuns, kettles and boilers streaming from the production sensors. The target pressure within the event, represented by the entire region where the target pressure lies below the red dashed line depicted in figure 2, is categorized as abnormal within this region, and elsewhere as normal.
Because sensor data are recorded at different time intervals depending on the frequency of the broadcasting signal it is necessary to create a time series with regular gap, and this would be the first step to create a predictive model.
A HANA function can then be used to harmonize the data for any arbitrary fixed time interval. The selection of the time-step should be done in a way that would minimize the errors associated with the smoothing of the signals while maintaining enough information useful for prediction.
The second step in the process is to automatically detect where the abnormal events have occurred in the historically available data and hence properly develop a starting dataset to later employ for training and testing our predictive model. This can be accomplished using a HANA function in SQL for computing rolling averages, as described in detail in Part II.
The next step is to sample the data before, during and after the event, for each event that occurred. The purpose is to account for the conditions that led to the pressure decay and those that made the recovery possible.
This sampling could render a dataset that is imbalanced, e.g., the classification categories might not be equally represented. The goal is to have approximately equal number of samples where the pressure remained below the threshold as the number of times the pressure value was normal. Among the techniques we used to balance the data are over/under-sampling and SMOTE . These algorithms are native to HANA Cloud and can be retrieved from the Predictive Analysis Library PAL .
In order to predict when the pressure drop will occur, we need to apply a lag to the target pressure in similar fashion as the process of credit scoring. This method has been successfully applied to other energy intensive industries by our Data Science group and also by others in telemetric data. Details can be found in [7,8].
Twelve months of data from sensors monitoring the temperatures, levels, pressures and mass flow rates of the tuns and kettle for two different lines were available for twelve months. After applying the HANA function for detecting abnormal events, it was found that these occurred in at least 48 different days, with at least one event per day. An example of the application of the function is shown in figure 3, where the red dashed line shows the pressure threshold of 87 psi and the solid line indicates the region where the pressure value is below the threshold for more than the set period of time. We note in passing that the function created is versatile, so that it is easily modified to accommodate for other conditions.
By applying the HANA functions to harmonize the data due to the records in historian having a wide range of timestamps, it was found that with a time-interval of 30 seconds, the loss of information caused by the smoothing of the signals at this level was insignificant as will be evident from the results shown below. The number of time-stamp records per signal per day at 30 second interval is 2,880 and thus, all the available sensor data can be pivoted and stored in a table as a regular time series. Pivoting is also another function easy to use in HANA SQL. Sensor data for the supply pressure is shown in figure 3. A comparison of the signal with that of figure 2 corroborates visually that the smoothed signal did not lose the main features, having only the tips of the spikes (transients) being absent.
For each event, subsamples spanning 4 hours prior to the event lasting up to1 hour posterior to the ending of the event were collected. This is better illustrated in figure 3, where the pressure signal in blue has been scaled on the left vertical axis to allow for plotting with the target in green on the right-hand vertical axis. If the value of the target is zero, then the pressure is acting as expected, when the target value is different from zero (one), then the pressure has dropped below the threshold for more than five minutes as shown in the figure. The figure also indicates that there were two events in the day where the target pressure dropped below the threshold and for more than 5 minutes.
For each day of the forty-eight days where at least one event occurred, subsamples of data were extracted as described in the section above. The green line in figure 3 is an example of how the classification was done. Values of 1 for the target indicates that the pressure was below the threshold for more than the permitted time.
Following the standard principles for model validation and testing, we split our twelve-month of available data into a training dataset containing the information of the first eight months, whereas the remaining 4 months were used for testing purposes (holdout dataset). Out of the training data, we further split them using percentage split into training and validation partitions, using 66% and 34% for training and validation purposes respectively.
The distribution of the ‘events‘ did not follow a seasonal pattern in the twelve-month period. There were about 40,000 instances where the target was labeled as normal (0) and 2,000 instances where the target was labeled as abnormal (1) and thus the dataset was balanced using SMOTE.
We tested different classification algorithms, including logistic regression, decision trees, xgboost and random forest , on target lags of 5, 10, 15 and 20 minutes. A typical set of metrics for a lagged target of 10 minutes is shown in figure 4 and table 1 using logistic regression with SMOTE. The number of explanatory variables in all cases was 47, with a training size of 52,899 and test size of 22,671. Figure 4 shows the confusion matrix results along with quantitative values of model performance in Table 1.
Figure 4. Confusion matrix from the test data set vs predicted values using logistic regression and SMOTE.
Table 1. Metrics and scorings quantifying the quality of the prediction using logistic regression and SMOTE for lagged target of 10 minutes.
We evaluated a spectrum of classification algorithms, including logistic regression, decision trees, xgboost, and random forest, for target lags of 5, 10, 15 and 20 minutes. We found that the performance of logistic regression was the lowest and the random forest the highest, but their differences were not statistically significant. Quantitative metrics of performance for the logistic regression are shown in table 1. It is important to keep in mind that these metrics are based on target values and not events. This point will be address in the discussion section below.
Now that we have quantified the performance of the algorithms, we test the model for a dataset that is neither in the test nor in the train set, but in the holdout dataset, corresponding to figure 3. Note that there are only two events occurring in the day (figure 3), but many different target values that were normal and abnormal. The results shown in Movie 1 below show the true performance of the algorithm proposed here. These simulations were conducted using Random Forest, but similar results were obtained using other algorithms listed above.
Movie 1. Simulation of the continuous application of the algorithm to forecast pressure drop using classification. First row from top: probability of the decision in red and target value (1/0) in black. Second row: supply stem vapor pressure in blue and threshold value as dashed red line of 87 psi and the black dash-dot line the value of normal pressure 105.7 psi. The green area from third row is the sum of mass flow rate at kettles from lines 1 and 2. Shown on the fourth row are the levels at the kettles in lines 1 and 2.
Sensor data for supply steam vapor pressure, sum of mass flow rates and levels in the kettles from lines 1 and 2, the predicted probability of occurrence and predicted target for the ‘event’ are shown in Movie 1 where the prediction was executed every 30 seconds. Figure 5 is a snapshot when the model predicts an event is about to happen. This example corresponds to that of figure 3, but we emphasize that the data shown in the figures and simulation was not included in the training of the model.
Figure 5. Forecasting of an event using classification and its probability. The frame is a snapshot from Movie 1 and the events corresponds to those shown in figure 3. In the second row, steam vapor pressure is depicted in blue with a threshold value of 87 psi as dashed red and a normal pressure 105.7 psi as black dash-dot line. The green area from the third row is the sum of mass flow rate to the kettles from lines 1 and 2. Shown on the fourth row are the levels of the kettles from lines 1 and 2 during the same time intervals.
The results shown suggest that the method proposed in this blog has the ability to accurately predict pressure drop events with enough time to take corrective actions to minimize the negative effects caused by the event. However, since there are no algorithms with 100% accuracy, it is important to evaluate the effects of an incorrect forecast of an event, i.e., to assess the quality of the trained model on the frequency of raising an alarm when is not needed and not raising an alarm when it is needed. A highly accurate model on the target as shown by the values of the confusion matrix of figure 4 may perform poorly when forecasting an event and thus could be rendered untrustworthy by the maintenance engineers.
To further demonstrate this point, a false positive for an event case is illustrated in figures 6 and 7. There are four distinct events shown in figure 6 and illustrated by the red solid lines with their corresponding target values in green. Figure 7 shows the forecasted values of the 10-minute lagged target along with the sensor data of the supply steam vapor pressure, mass flow rate of the combined contribution to the kettles in line 1 and 2, and their corresponding levels at the bottom of the figure. The algorithm predicts at least 6 events instead of four, and therefore it would accurately raise an early warning for all 4 critical incidents, at the cost of having two additional false positive events yielding a business-critical accuracy equal to 66.6%!
Figure 6. Caption as in figure 3.
Figure 7. Caption as in figure 5.
From the business perspective, raising a false alarm will result in turning on the boiler when it is not needed, whereas a false negative implies loss of product since no corrective actions were taken to re-establish production stability. The financial cost of these errors has to be counter-balanced with the gains from preventing critical production incidents. Such cost varies wildly on the type of each industry and process under consideration.
A false positive has a consequence of turning on the boiler when it is not needed, whereas a false negative implies loss of product since no corrective actions were taken to bring the pressure back to normal. Therefore, a financial cost of these errors has to be measured against the gains of predicting the pressure drops for deciding whether to implement this type of solution, and this cost depends strongly on the type of industry and process.
There are many topics and details of this problem that could not be included in this blog post and will be addressed elsewhere. Follow the tag Machine Learning or my profile to be notified about the upcoming part (Part III) of this series.
Any inquiries about the content of these blogs can be posted in the respective tags:
- Q&A about Machine Learning
- Q&A about Python
- Q&A about SAP Data Warehouse Cloud
- Q&A about SAP HANA
- Q&A about SAP Data Intelligence
- Q&A about SAP Business Technology Platform
- Q&A about SAP HANA Cloud
- Other question? Ask here: https://community.sap.com/
This work was possible thanks to team members Jose M. Escolar, Hernan Sendra, Naveen Nalla and Nidhi Sawhney. Special thanks to Sricharan Poundarikapuram and Dimitrios Lyras for helpful discussions.
 Chardin B., Lacombe J. M., Petit J. M. (2013). “Data Historians in the Data Management Landscape.” In: Nambiar R., Poess M. (eds) Selected Topics in Performance Evaluation and Benchmarking. TPCTC 2012. Lecture Notes in Computer Science, vol 7755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36727-4_9
 Chawla, N. V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002). “SMOTE: Synthetic Minority Over-Sampling Technique.” Journal of Artificial Intelligence Research 16: 321–357. Crossref. Web.
 Kind, J. and Lang. R. “P&I IoT Customer Innovation” Walldorf SAP Predictive Summit 2015 St. [Leon-Rot, September 22.
 Kind, J. “Classification in Predictive Maintenance” Walldorf SAP Predictive Summit 2018 St. [Leon-Rot, September 23.