SAP HANA Cloud, Data Lake Relational Engine on Object Storage
In the 2022 QRC3 release of HANA Cloud, the data lake relational engine is releasing a major new feature that will change the TCO profile for the relational engine and provide improved performance in a variety of areas.
What is this change?
As you may know, the HANA Cloud, data lake relational engine was based on the on-premise SAP IQ technology. This core underwent a transformation to make it available as a cloud service, while still providing a wide breadth of features and the maturity of the on-premise product at its core.
Up to this point, the HANA data lake relational engine (also known as HDLRE) has relied on more traditional block-based disk storage in the cloud to provide its service. This enabled us to make the service available in the cloud in a shorter timeframe but with complete confidence in its quality and provide a huge amount of functionality. However, there were some drawbacks. For example, when provisioning an HDLRE instance, you had to pre-provision your storage and grow it yourself when it got full. In addition, the amount of storage you provisioned had a direct impact on the I/O performance of your instance. Finally, you could not shrink your storage once it was provisioned unless you rebuilt your instance.
Starting with the QRC 2022 release, HDLRE now leverages the HDL Files storage for its database files – in other words HDLRE stores its its database files in object storage. This resolves many of the drawbacks cited above and provides some additional benefits.
What Are The Benefits?
Moving to object storage for the HANA Cloud, data lake relational engine storage provides some significant benefits.
- No more need to pre-provision storage. Storage will dynamically grow and shrink (in 1TB increments) based on the amount of data you have stored in HDLRE. This also makes provisioning simpler since you no longer need to supply a size for the relational engine.
- The cost of storage goes down because storage is elastic, and grows/shrinks based on your usage. This will potentially provide significant savings for at-rest data storage.
- Performance becomes independent of provisioning size. Your storage performance is more dynamically scalable and related more to the amount of activity occurring than the amount of data stored.
How Are Costs Impacted?
There are a few changes to metering which will impact costs for HANA Cloud, data lake, and overall, we expect most, if not all, customers to see a decrease in costs. In some cases this will be a small benefit, and in others it will be significant. It is also important to note that these cost improvements will be recognized incrementally, since there is some cost associated with the upgrade to use cloud object storage.
Your HANA Cloud, data lake billing will also look slightly different with the new Cloud DBSpace feature enabled. Below is a summary of what you can expect to see.
As I mentioned above, object storage is significantly cheaper than block based storage. We expect to see a large decrease in costs for data at rest.
The SLA for backup is not changing, but there are a couple of changes related to backup that will affect TCO. The system database, which is small compared to the amount of user data we expect to see in a data lake, will still be backed up using traditional database backup methods. The user data, which is stored in object storage, will leverage the features of object storage (eg. snapshots and durability) to ensure recoverability in the case where a problem arises that requires recovery. Depending on your actual usage of the relational engine, you could see a significant decrease in backup costs. For example, if you only add and rarely update data in HDL RE, your backup costs will be significantly lower. If, however, you are constantly updating data in HDLRE, your backup costs will be similar to what they are today.
There are some changes to the compute charges for HDL RE that align it more closely with a pay-per-use model. The existing compute charges (the number of vCPUs allocated to HDL RE processing) are not changing, but HANA Cloud, data lake Files API calls metric will now be used to track read/write from/to object storage. These costs are directly related to your actual usage of the data lake. If you were storing data and only querying infrequently, you will not see much of a change in your compute costs. However, if you are querying the data lake heavily, you could see an increase in your compute costs.
Network Data Transfer
You may also see some changes in the Network Data Transfer metric. Prior to this change, there were network data transfer charges on some cloud providers for both read and write activities to the data lake. After this change, this metric should reflect read operations from the data lake (ie. query result sets and file reads). In almost all cases this should result in no change or a reduction of Network Data Transfer charges.
How Do I Upgrade My Existing HDLRE Instance to Use Object Storage?
With the release of HANA Cloud QRC3, all new instances of HANA Cloud, data lake relational engine will use object storage by default. However, existing instances will continue to use the traditional storage until upgraded. The upgrade to the object storage for your instance is a two step process. The first step is to upgrade the software to QRC3. This is a ‘regular’ upgrade, available to you from the HANA Cloud Central tool. You can do this upgrade at the time of your choosing.
After the software for your instance has been upgraded, another upgrade will be made available to you to perform the upgrade to object storage. This upgrade is executed in the exact same way as the software upgrade – from the HANA Cloud Central interface for your instance. The only difference is that this upgrade will take a little bit longer than a regular upgrade, since your data will be moved to object storage. The duration of the upgrade is dependent on how large your instance is. It could be anywhere from a few minutes for a 1TB instance, to a few hours for much larger instances.
Once the upgrade is complete, you are all set. There is nothing further that you must do to enable the use of object storage.
We recommend that you schedule and perform this upgrade based on your business availability. All HANA Cloud, data lake instances will be upgraded to the 2022 QRC3 release by SAP beginning in Q1 of 2023. The pre-defined maintenance window will be used for this upgrade.
Hi Jason Hinsperger !
Thank you for this update! Looks like a simplification and move in the right direction.
If I understand right, IQ plays now more the role of a query engine like we can see it with Amazon Athena or Presto/Trino? SQL on File did already the same from my understanding but now we do not have to distinguish between block and object store?
Do you see activities to expand these developments into the direction of what is currently discussed as Data Lakehouse architecture? Would mean to enable. table formats like Apache Iceberg or Apache Hudi.
There is still a difference between the HANA data lake relational engine and SQL-on-FIles. The relational engine data is stored in object storage, but it is still stored/transformed to the native relational engine (IQ) format, which enables much better performance for complex analytic queries, but requires the additional transformation step. SQL-on-Files on the other hand is able to effectively query structured data in its native format (eg. csv, parquet, orc) using a serverless architecture, but it is not yet ready to handle the depth of SQL analysis supported by the HANA data lake relational engine. This is why having them work together in HANA Cloud is so useful - you can choose the right tool for the job you need done.
The open table formats like Delta, Iceberg and Hudi are very interesting and determining how best to integrate/support them with HANA Cloud is under discussion.
thank you for the clarification. Happy to see progress here.
Hello Peter Baumann
SAP/Sybase IQ is rather an all purpose Database and the Key Strength is the columnar compression of data available on disk with no sizing limit and is used as the standard solution for NLS/DTO and ILM.
Hence, the usage as native data lake together with DI and BW/4 is in direct competition of the SAP Cloud Implementation using DWC, HANA Cloud and HANA Datalake and is not mentioned/wanted at all to be visible by our (on-premise) Customers.
Even the fact that NLS/DTO and ILM will be huge challenge to enable maybe in the future, then is is still questionable why on earth the historical data store behind firewall's and high secured Datacenter's should be moved by purpose to Hyperscalers like Amazon, Google or Microsoft were the Governance manly handled in the US.
Blog - SAP (Sybase) IQ – the hidden treasure …
Best Regards Roland
Hello Roland Kramer
RISE with SAP and the complete cloud first approach with SAP Business Technology Platform relies on the Hyperscalers datacenter like Amazon, Google or Microsoft. Therefore I don't understand the argument why something like the complete on-premise infrastructure is hosted at the Microsoft Azure datacenter in the Netherlands but not the data in a Microsoft Azure Data Lake in the same datacenter?
Nevertheless I see already the competition you mentioned. On the one site a BW/4 and IQ/NLS with all its well-know advantages and on the other site the highly advertised and promoted DWC and HANA Cloud incl. Data Lake which follows the modern data & analytics stack. But both preferable in one of the Hyperscaler datacenters due to RISE and the BTP.
Thanks for the update Jason Hinsperger . Just to clarify, I don't need to select storage size anymore. Say I keep ingesting structured/unstructured data from cloud storages, the data Lake storage keeps incrementing in 1 TB block dynamically . And in a similar fashion , I can increase/decrease the compute + # of workers based upon the data load. Is this a right statement?
Yes, this is correct.
Thanks Jason Hinsperger . Now we can definitely say its decoupled compute and storage