Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
rafael_pacheco
Participant
Summary

In this blog the Dynamic Time Warping algorithm of SAP HANA PAL is applied to align two signals that can be used to assess the quality of a long section of railroad track. The algorithm is fast and efficient and proves to be a very useful addition to the Predictive Analysis Library for the solution of problems involving the comparison and averaging of time series.

Motivation

As of 2016, there were nearly 30 million train riders each year in the USA, with nearly 40 million carloads shipped and $82 billion in freight revenue being generated annually along 140,000 rail miles operated by seven Class I railroads [1,2].

Maintenance of the constituent components of the railway is important to ensure that the train can be safely driven since rail defects are usually the cause of derailment accidents [3]. To guarantee safe traffic conditions it is important to test rails that are in constant use with minimal disruption to train times. This maintenance process is expensive due to tamping, ballast cleaning or renewal, sleeper replacement, joint repair, rail grinding or replacement and substructure treatment and other maintenance interventions.

Track quality is traditionally monitored by running a specialized-on-moving measuring car that measures track geometric basic parameters at intervals of about three months. Usually, the inspection car measures and records the variation of the track gage, vertical and lateral alignments and cross-level (angular variation on a track among many others. These measurements are recorded about every 900 mm, from lengths of up to 100 km, resulting in long sequences of data (signals).

The condition of the track is assessed by comparing the signals to a set of previously recorded reference signals. Before this can be accomplished, the two sets of signals need to be aligned so that the corresponding signals refer to the same point on the track with the caveat that the matching has to be done simultaneously. This matching is often done manually using visual inspection of the signals simultaneously

In order to automate the process and minimize the time for matching long sequences of signals we use SAP HANA Predictive Analysis Library (PAL) new Dynamic Time Warping algorithm [4]. Automating this process would diminishing the errors associated with visual estimation of optimal alignment and expedite accurate results, which are particularly difficult for long stretches of sequences.

The data set corresponds to a region near the lake of Chapala, Mexico as shown in Figure 1. This is an area with hazardous natural topography and the road traverses across the Sierra Madre Occidental, an area with shifting wind conditions that is prone to regular maintenance [5].
Figure 1. Area of interest near Chapala lake, Mexico. The insert depicts the section of railroad around the ‘Volcan de Colima’ shown in bold red.

 

Dynamic Time Warping

The Dynamic Time Warping (DTW) was developed for speech recognition [6] and is commonly used for comparing data sequences, time series or classification samples in machine learning applications for optical character recognition, robotics, medical applications, time-series averaging etc. The alignment of two sequences of feature vectors is accomplished by stretching the time axis iteratively by duplicating distinctive data points until an optimal metric between the two sequences is minimized in an optimal way.

It is assumed that we want to compare two sequences, namely a target or query x = (x1, x2, x3, …, xN) and a reference y = (y1, y2, y3, …, yM).  Both sequences start at the bottom left of the grid shown in Figure 2. In order to align the sequences for comparison, a distance matrix C of N x M dimension has to be built. The element of the matrix C is an appropriate distance between two points xi and yj and represented by cij, e.g. for the Euclidean distance, cij = (xi - yj)2. Once the distance matrix has been built, the DTW algorithm finds the alignment path, satisfying boundary conditions, monotoniciy and step size, which runs through the matrix elements that defines a mapping between x and y. Details of the algorithm and its implementation can be found in [6].

Figure 2. Optimal warping path illustrating the alignment idea of two sequences. The query (bottom horizontal axis) and a reference (left vertical axis).

 

The data set

The data along the track shown in Figure 1 is a section of about 27,000 m, from km 117 to km 144. The query set (target) has 88,749 points and the reference set has 91,261 points. Three signals from the set were aligned simultaneously, namely Curvature (Curvatura), Loaded Gauge (Medidor Cargado), Cross Level (Nivel Cruzado).

Since the accumulated distance matrix C has N x M entries, the computational complexity of the classic DTW algorithm is O(NM), i.e. for very long sequences, the computational cost of building a DTW solution could be very expensive. In our case, the size of the distance matrix C is N x M = 8,099,322,000 for the small portion of rail track that we are analyzing.

To reduce the computational time and memory requirements for this problem we partition the reference and query into a sub-series of lengths of 1.2 km and 1 km respectively, ensuring that there is always overlapping between the partitions and the location in the query, i.e. the query is a subset of the location in the reference. The final output is simply the appended values of the aligned signals at 1 km intervals that cover the entire length of the track. This is illustrated in the sequence of images shown in Figure 3.



Figure 3. Small extract from two Cross-Level signals illustrating the matching and partitioning. The black line corresponds to the query and the blue to the reference.

 

The images illustrating the Cross-Level signals of the reference and query as function of the reference location are shown in Figure 3. The blue lines correspond to the reference signal and the location interval is 1 km. The end of the signals in Figure 3(b) correspond to the beginning of the signal Figure 3(d). Similarly, the end of the signals of Figure 3(d) correspond to the beginning of the signals in Figure 3(f). By assembling the signals in this manner, we obtain the alignment of the entire data set. This is accomplished using a loop and the DTW from the Predictive Analysis Library native to HANA. We note in passing that that the alignment is just the preparation for the analysis of predictive maintenance. Once we have the reference signal and the target signal aligned, the difference in the signal along the location can be used for the analysis of the railroad conditions.

We have implemented the solution using DTW from SAP HANA PAL using SQL script [4], and for comparison of performance in R and Python. We found that DTW in HANA is tremendously faster by at least a factor o 100. The solution in R takes about 27 seconds (package “dtw” [7]), in python 20 seconds (package “dtw-python” [8]) and in HANA using SQL PAL less than a second to compute the entire data set. The SQL script and data can be obtained by contacting the author or in the github repository [9]. A sample code demonstrating the use of the DTW for two channels is provided in the Appendix.

Remarks

There are many applications of the DTW algorithm applicable to Data Science, and predictive maintenance for railroads is just one of them. The DTW algorithm is a welcomed addition to the SAP HANA Predictive Analysis Library in SAP HANA 2 SPS05 and SAP HANA Cloud. We are currently using this algorithm for the averaging of time-series [10] with applications to maritime awareness. The procedure for time-series averaging and application to constructing the path that a vessel travels from port to port using DTW will be presented in the near future.

References

 

[1] https://www.amtrak.com/content/dam/projects/dotcom/english/public/documents/corporate/nationalfactsh...

[2] https://gacc.nifc.gov/sacc/predictive/intelligence/NationalLargeIncidentYTDReport.pdf

[3] https://railroads.dot.gov/

[4] SAP HANA Predictive Analysis Library (PAL) https://help.sap.com/viewer/2cfbc5cf2bc14f028cfbe2a2bba60a50/2.0.05/en-US

[5] https://theguadalajarareporter.net/index.php/news/news/lake-chapala/53457-firefighters-battle-wildfi...

[6] Sakoe H., Chiba S. (1978). “Dynamic programming algorithm optimization for spoken word recognition”, IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-26, 43-49. https://ieeexplore.ieee.org/document/1163055

[7] Giorgino, T. (2009). Computing and Visualizing Dynamic Time Warping Alignments in R: The ‘dtw’ Package. Journal of Statistical Software, 31(7), 1-24, doi:10.18637/jss.v031.i07. http://dtw.r-forge.r-project.org/

[8] The python package ‘dtw-python.'  https://pypi.org/project/dtw-python/

[9] SAP HANA Machine Learning Github: https://github.com/SAP-samples/hana-ml-samples/tree/master/PAL-SQL/usecase_examples

[10] Petitjean F., Ketterlin A., Gancarski P. (2011). “A global averaging method for dynamic time warping, with applications to clustering.” Pattern Recognition, 44(3), 678 – 693. ISSN 0031-3203. URL https://doi.org/10.1016/j.patcog.2010.09.013.

 

Appendix – Sample Code
set schema DM_PAL;

drop table PAL_FAST_DTW_DATA_TAB;
create column table PAL_FAST_DTW_DATA_TAB (
"ID" nvarchar(100),
"TIMESTAMP" nvarchar(100),
"ATTR1" integer, -- only support integer or double
"ATTR2" double
);

drop table PAL_PARAMETER_TAB;
create column table PAL_PARAMETER_TAB (
"PARAM_NAME" nvarchar (256),
"INT_VALUE" integer,
"DOUBLE_VALUE" double,
"STRING_VALUE" nvarchar (1000)
);

drop table PAL_FAST_DTW_RES_TAB;
create column table PAL_FAST_DTW_RES_TAB (
"LEFT_ID" nvarchar(100),
"RIGHT_ID" nvarchar(100),
"DISTANCE" double
);

drop table PAL_FAST_DTW_ALIGN_TAB;
create column table PAL_FAST_DTW_ALIGN_TAB (
"LEFT_ID" nvarchar(256),
"RIGHT_ID" nvarchar(246),
"LEFT_INDEX" integer,
"RIGHT_INDEX" integer
);

drop table PAL_FAST_DTW_STAT_TAB;
create column table PAL_FAST_DTW_STAT_TAB (
"STAT_NAME" nvarchar(256),
"STAT_VALUE" nvarchar(1000)
);

truncate table PAL_FAST_DTW_DATA_TAB;
-- time series 1
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-01', 1, 5.2);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-02', 2, 5.1);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-03', 3, 2.0);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-04', 4, 0.3);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-05', 5, 1.2);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-06', 6, 7.7);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-07', 7, 0.0);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-08', 8, 1.1);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-09', 9, 3.2);
insert into PAL_FAST_DTW_DATA_TAB values (1, '2020-01-10', 10, 2.3);
-- time series 2
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-01', 7, 2.0);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-02', 6, 1.4);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-03', 1, 0.9);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-04', 3, 1.2);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-05', 2, 10.2);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-06', 5, 2.3);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-07', 4, 4.5);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-08', 3, 4.6);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-09', 3, 3.5);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-10', 3, 2.2);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-11', 1, 3.2);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-12', 5, 7.8);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-13', 5, 5.1);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-14', 6, 1.3);
insert into PAL_FAST_DTW_DATA_TAB values (2, '2020-02-15', 2, 2.0);
-- print
select * from PAL_FAST_DTW_DATA_TAB order by ID, "TIMESTAMP";

truncate table PAL_PARAMETER_TAB;
insert into PAL_PARAMETER_TAB values ('RADIUS', 5, null, null);
insert into PAL_PARAMETER_TAB values ('THREAD_RATIO', null, 1, null);
insert into PAL_PARAMETER_TAB values ('DISTANCE_METHOD', 2, null, null); -- 1: MANHATTAN; 2: EUCLIDEAN (default); 3: MINKOWSKI; 4: CHEBYSHEV; 6: COSINE;
--insert into PAL_PARAMETER_TAB values ('MINKOWSKI_POWER', null, 0.7, null); -- only valid when DISTANCE_METHOD = 3, default 3.0
insert into PAL_PARAMETER_TAB values ('SAVE_ALIGNMENT', 1, null, null);

do begin
-- call DTW
lt_data = select * from PAL_FAST_DTW_DATA_TAB;
lt_param = select * from PAL_PARAMETER_TAB;
call _SYS_AFL.PAL_FAST_DTW(:lt_data, :lt_param, lt_res, lt_align, lt_stat);
insert into PAL_FAST_DTW_RES_TAB select * from :lt_res;
insert into PAL_FAST_DTW_ALIGN_TAB select * from :lt_align;

-- print output table
select * from PAL_FAST_DTW_RES_TAB;
select * from PAL_FAST_DTW_ALIGN_TAB;

-- post operation of getting warped time series, assuming the left one is the referenced one and the right one is the warped one
-- first attach timestamp index on original time series starting from 0
lt_time_series =
select *, (row_number() over (partition by id order by "TIMESTAMP") -1) as IDX from PAL_FAST_DTW_DATA_TAB;
-- print
select * from :lt_time_series;

-- may have multiple timestamp of the same value
lt_warped_without_merge =
select t0.LEFT_ID, t0.RIGHT_ID, left_t."TIMESTAMP", right_t.ATTR1, right_t.ATTR2
from PAL_FAST_DTW_ALIGN_TAB t0
join :lt_time_series left_t
on t0.LEFT_ID = left_t.ID and t0.LEFT_INDEX = left_t.IDX
join :lt_time_series right_t
on t0.RIGHT_ID = right_t.ID and t0.RIGHT_INDEX = right_t.IDX;
-- print
select * from :lt_warped_without_merge;

-- aggregate attribute value of same timestamp. User can use other aggregate functions like MIN/MAX
lt_warped =
select LEFT_ID, RIGHT_ID, "TIMESTAMP", avg(ATTR1) as ATTR1, avg(ATTR2) as ATTR2
from :lt_warped_without_merge group by LEFT_ID, RIGHT_ID, "TIMESTAMP" order by "TIMESTAMP";
-- print
select * from :lt_warped;
-- the warped time series 2 you wanted, which is aligned by timestamps of time series 1
select RIGHT_ID, "TIMESTAMP", ATTR1, ATTR2 from :lt_warped;
end;
6 Comments