FAQs for Big Data & Analytics on Timeseries within SAP IoT Application Enablement (Leonardo Foundation)
Time Series Functionality
Which different storage types exist and how do they differ?
There are 3 different storage types that differ in access speed and costs. At data ingestion, data is stored in all 3 storages automatically according to the properties of each data store.
Hot Storage is the fastest storage alternative. It is based on SAP HANA technology. IoT Application Enablement stores in HotStorage aggregates of time series data. Those aggregates are used to deliver data very quickly, sub second, to applications. Applications do not require raw data for most scenarios. E.g. a time series chart can typically display not more than 1900 data points on an HD display. If a sensor collects per second data, all charts showing a time window larger than 30 min must use aggregated data. Aggregates are stored for 1 year and then moved to warm storage.
Apart from providing aggregates, on hot storage also simple analytical queries can be directly executed on this storage.
Hot storage is currently not separately priced.
The warm storage is used to store and read individual data points of time series data and deliver it quickly to applications. E.g. in the above scenario of displaying up to 30 min time window of per second data in a time series chart, data is read directly from warm storage and provided sub second to the application.
The timeseries data stored in warm storage has a standard retention period of 60 days, but can be set to a custom retention period. The retention period is stored with the data. That is, it can be adapted for each new inserted data set.
Warm storage uses a distributed storage technology; it is optimized for storing and reading individual data points of time series data. Analytical queries cannot be executed on this storage directly.
Warm storage is billed by amount of data stored per month.
Cold storage is able to store large amounts of time series data cost efficiently for an almost unlimited amount of time. It is optimized for cost efficient long term storage. Accordingly, reading from cold storage is not as performant as from warm storage. Reading time series data typically requires several seconds to arrive at the application. Reading from cold storage is used e.g. in data science scenarios, where a large data set is replicated into an analytics data store like SAP HANA, SAP Vora or Hadoop and data scientist work then with the data to identify patterns in the data or apply machine learning algorithms.
Cold storage is billed by amount of data stored per month.
Which aggregates are stored in Hot Storage?
All aggregates in hot storage are automatically calculated at data ingestion time.
Calculated aggregates are for the time windows
- 2 min
- 1 hour
- 1 day
- 1 week
The following 13 aggregates are available by default:
- Timestamp of first
- Timestamp of last
- Timestamp of min
- Timestamp of max
- Standard deviation
- Percentage Good values (based on quality code of time series data)
If quality codes are provided for the timeseries data, all aggregates are calculated on values with “good” quality.
Writing and Reading Timeseries Raw Data
The API for writing and reading raw timeseries is described here: https://uacp2.hana.ondemand.com/viewer/350cb3262cb8496b9f5e9e8b039b52db/18.104.22.168/en-US/108915d10fb54e7eb905704ea67d7d30.html
For accessing data from cold storage, please see:
Which analytical queries can be executed on Hot Storage?
Today the following queries are supported:
- Reading time series data for a specified time window. A filter criterion can be given, e.g. only reading data where a threshold is exceeded.
- Reading the aggregates for a specified time window. A filter criterion can be given, e.g. only reading data where a threshold is exceeded.
- Snapshot: For a thing, return the most actual values (with timestamp) for selected time series.
- M4 Algorithm: Retrieve aggregates of timeseries for defined time window and a defined granularity (groupby time). The aggregates will be calculated automatically to the defined granularity, if it is not one of the standard stored one. A filter criterion can be given, e.g. only reading data where a threshold is exceeded.
The queries capabilities are continuously enhanced.