SAP HANA and 3D XPoint
Throughout the last few years, there have been a lot of rumors about 3D XPoint and how this new technology will influence major applications, including SAP HANA. The conclusion arrived on this year’s Sapphire in Orlando, where Intel and SAP announced their collaborative efforts in integrating this innovative memory technology into SAP HANA. This makes SAP HANA one of the first major applications – if not THE first – to integrate this new memory tier into its core.
The general characteristics of 3D XPoint are known for a rather long time, leading to speculations on how SAP HANA might take advantage of that. In an effort to resolve these rumors, this blog will provide technical insights on how SAP HANA will actually adopt this new technology.
I highly recommend to also have a look at Daniel Schneiss – SVP, Global Head of SAP HANA Development – earlier blog post, sharing his view on How New Non-Volatile Hardware Technology Revolutionizes In-Memory Computing.
3D XPoint is developed by Intel and Micron Technology as a new class of persistent memory. Compared to industry-standard NAND, it is up to 1.000 times faster and offers 1.000 times greater endurance. Its density is factor 10 higher than industry-standard DRAM.
These characteristics place 3D XPoint in many ways right between DRAM and NAND storage, combining many of the benefits of both storage tiers. Consequently, Intel will offer two flavors of this technology. One that accelerates access to the storage layer and a second one that can be used similar to DRAM – including byte-addressability, coverage by CPU cache line handling and accessibility by load/store instructions – but with the inherent advantage of non-volatility.
How SAP HANA uses 3D XPoint
Traditional disk-oriented DBMS can benefit greatly from the SSD form-factor as they are already heavily optimized for block-based disk accesses. On the other side, their data structures in RAM are highly volatile – which contradicts the paradigm behind non-volatile RAM – thus mitigating the benefits they can achieve with the DIMM-version of this technology.
Undoubtedly, SAP HANA would also benefit from accelerated access to its persistency, i.e., by using the storage-like variety of 3D XPoint. Large HANA databases – that is, several tera byte of data – might take considerable time after a restart until all relevant data is available in-memory and ready for access. Improving bandwidth with faster storage could reduce this time significantly.
SAP follows a very different approach with SAP HANA, though.
What if reading from persistency wasn’t necessary at all?
SAP HANA is an in-memory database. As such it is already heavily optimized for all kinds of in-memory operations. Memory requirements of applications are rising exponentially. It becomes more and more of a challenge to equip servers with sufficient main memory, because of hard technical limits and TCO.
Although many data structures can potentially be placed in NVRAM instead of DRAM, this is not the case for all of them. The more frequent and randomly they are accessed or changed, the more painful the lower latency and limited endurance of NVRAM becomes. This means that NVRAM can only be used as an addition to traditional DRAM – at least as of today.
An excellent candidate for placement in NVRAM is the MAIN-part of SAP HANAs column store. It is heavily optimized in terms of compression, leading to a very stable – non-volatile – data structure. Furthermore, the main contains well over 90% of the data footprint in most SAP HANA databases, which means it offers a lot of potential. The main is only changed during the delta merge. Since this process is asynchronous, it significantly lowers the impact of slower write transactions on NVRAM compared to DRAM. Especially since these changes must be written to the – SSD-based – persistency layer anyway. Fewer writes are also favorable in terms of endurance, since NVRAM has a limited number of P/E cycles. Read access to the main involves mostly sequential scan operations, where cache line pre-fetching will mitigate the higher latency of NVRAM.
This design fits HANAs architecture almost perfectly. The separation of write-optimized Delta and read-optimized Main and the characteristics of both areas match the respective strengths and weaknesses of both, DRAM and NVRAM, excellently.
The non-volatility of NVRAM means that, at startup, all data in the column store main (90%+ of all data!) will already be in memory. No more exhausting reloading from the persistency and full performance right from the beginning.
The persistency layer is still required
Although the majority of in-memory operations can now run in NVRAM, there are still other key features of SAP HANA that rely on the persistency layer. This includes the row store and the column store delta, as well as system replication and database backups. With HANAs shared-nothing architecture, this also has an impact on auto-host failover, since the NVRAM of an inactive host cannot be re-assigned to an active one. Removing the persistency of the column store main from the current persistency implementation on disk would result in two separate persistency areas, which must be kept consistent.
There’s also the cost aspect. NVRAM is cheaper than DRAM, it is still more expensive than SSDs or other persistent memory technologies. SAP HANA already employs techniques to reduce the memory footprint of memory-hungry data types, for example, BLOBs. Instead, they’re accessed from disk rather than keeping them in memory continuously. Keeping such data in memory – even in cheaper NVRAM – would increase the cost unnecessarily.
SAP plans to deliver 3D XPoint support with HANA by the time our hardware partners provide the corresponding hardware. This reflects our current plans, but might be subject to change without further notice.
With 3D XPoint, customers can increase the maximum capacity of SAP HANA significantly. At the same time, TCO shrinks due to much lower acquisition costs and power consumption of NVRAM compared to DRAM.
As a consequence, HANA will not have to load column store data from persistency to memory, even after a complete shutdown. This raises the overall availability of HANA significantly and comes out-of-the-box, without the need for complex downtime optimization.