What is better, saving a file in RAM or on disk? It depends…
What is better, writing a file as-is or compressing it? It depends…
What is better, modifying the file when one line does change or keeping the history? It depends…
What is better, storing the data in format A or B? It depends…
How to deal with the ‘It depends’
The naive approach is to do both, e.g. store the data in both formats, columnar and row. This way the user can access either the one or the other, whatever makes more sense. But, of course, this option comes at a price. In this case, data needs to be stored twice – combining the advantages plus the disadvantages.
The better approach is to combine features in a way so the downside of the one is compensated for by another technology (and vice versa). A combination of the advantages and eliminating the disadvantages. And that is the secret of SAP HANA. Not just the in-memory capabilities, not columnar storage, not compression; but the intelligent combination of those. No other database vendor in the in-memory space does that to the same degree, mostly for historical reasons.
Why? Enjoy this 2 minute video. Granted, looking at the past with the knowledge we have today is unfair, but this video is still fun to watch.
The main argument is that you cannot keep everything in your head, it is simply not possible. If we take that statement for granted, compression could be a way out. Assuming a 1TB RAM server is too expensive, if the data can be compressed by a factor of 10, the RAM requirements would drop down to a cheaper 128GB RAM computer. Compression has downsides, what technology can circumvent that?
SAP HANA uses four technologies to achieve the all-advantages-no-disadvantages combination:
The pros and cons of each technology
The basic idea is that memory is much faster than disk. Actually the difference is more than people usually exepct. A modern CPU has a memory bandwidth of 20GByte/sec and higher, a single disk is around 550MByte/sec for SSDs and 180MByte/sec for hard disk drives – a difference by factor 36 and 110. If a program is using the same memory frequently, it is cached inside the CPU’s L1 or L2 cache, speeding up the memory bandwidth by another factor of 10. On the contrary, the disk speed of 180MB/sec is for sequential read access only, random access would be times worse for a disk system whereas has no negative impact on the RAM and little for SDDs.
The downside of memory is the costs of the memory chip itself (7USD/GByte for RAM compared to 0.05USD/GByte for disks as of 2017) and the hardware platform needed in order to cope with as much memory.
Looking at the absolute numbers, 1TB of RAM would be 7000USD – affordable.
So what are the pros and cons of in-memory computing?
- Updating data is fast
- Inserting data is fast
- Reading is fast
- When the power is gone, all data is gone as well
- While a 128GB server is cheap, a 1TB server is affordable, a 64TB server expensive and a 1000TB server simply does not exist (e.g. here)
The idea of compression is simple: a single CPU is much faster than the disk, hence compressing data in order to reduce the amount of data is beneficial as long the overhead of that is not too huge. Therefore every major database supports compression. It is not very popular though, as compressing a database block and decompressing it takes its toll. The most obvious cost overhead is when data is updated inside a database block. The database block has to be uncompressed, the changed data merged into it and then the block has to be compressed again.
- Reduces the required size
- Inserts require more CPU power
- Reads require more CPU power
- Updates require a lot of more CPU power
For a simple select sum(revenue) the columnar storage is just perfect. One column is read only, which is a fraction of the whole table. This will be much faster than traditional row orientated tables, where all the table data is in one file.
In case of selecting one entire row, a row orientated storage sounds like a better fit. Inserting a new row – same argument.
- All data of one column is closely together
- All data of one row is stored in different places
A real database should have read consistency in the sense of “when a select statement was triggered, all data committed at this very point in time is visible and only that data”. If another transaction does change a row which has not been read yet, it should still return the version valid at the query execution start time. So the old values have to be preserved somewhere.
The only major database I know supporting that since the beginning is Oracle (SQL Server has a option to turn that on/off), but you have to pay a price for that consistency. Whenever the Oracle database writer does overwrite a block with new data, the old version is copied into the rollback segment of the database. So a simple update or insert into an existing block requires two operations, the actual change plus preserving the old version.
With insert only, it’s quite a different story. There, in each table the old data is never overwritten but only appended. Updating an existing row means appending a new row with the timestamp as version information. A select statement picks the most recent version based on the query execution timestamp.
- Write is fast
- Reads are slower
- Requires more storage
Combining the technologies
Okay, so individually all technologies are good ideas, and other vendors have tried these as well and have built proper niche products, mostly around the analytic use cases. One database is a pure in-memory database that needs a regular database to persist the data – a caching system, others you can insert data but not update or delete. Many support compression but usually it is turned off by customers.
The claim of SAP HANA is to be a real RDBMS database that supports analytic and transactional use cases and outperforms other databases in all areas. So none of the above downsides are acceptable. This is achieved by combining the four technologies of SAP HANA…
First I want to show how the combination of two technologies have a positive impact on each other.
Compressing data can be done best whenever there are repetitive patterns. This is a real world example, the material master table (MARA in SAP ERP).
What can you compress better, the entire file, one row or a one column?
The answer is obvious, but, nevertheless, I exported these columns from the MARA table, all 20’000 rows of my system, into a CSV file (1’033KB big) and did zip the one file with all nine columns. For comparison, nine files with one column have each been created as well.
Obviously, the primary key cannot be compressed much, it is half of the overall file size. All other columns compress very well. The MANDT file is even just 303 bytes large, the ERSDA file with all the many create dates is 12’803 bytes big. The reason is obvious, the data within one column looks similar and can be compressed nicely therefore, the data within a row is different by nature and can be compressed less effective.
But this is not a fair comparison because the zip algorithm favors larger datasets as it can look for patterns easier and it is a fairly aggressive algorithm. In databases, the aim is for lower compression ratios that require less CPU cycles. But although the zip favors single files, in this example the nine individual files are 104KB big in sum, the same data in one file is 111KB. If the amount of data would be larger, the savings would grow even further.
- Compressing a columnar store is more efficient than compressing a row store
- Compression requires CPU power
- Need to find a better suited compression algorithm than zip
- Reading, inserting, updating an entire row is not solved still
Compression has one important downside, what if a row is updated or deleted? The compression spans multiple rows, so when a single value is updated, the entire unit has to be uncompressed, modified and compressed again. With traditional databases and turned on compression, this is exactly what happens, on a disk block level.
So what does SAP HANA do? It does not update and delete the existing data!
Instead, it appends the change as a new version with a timestamp as version information and when the table is queried, it returns the oldest version of each row, the oldest version that matches the query execution start time. So no recompression is needed, data is appended uncompressed to the end of the table and once the uncompressed area exceeds a limit, it gets compressed in one go.
The other advantage of that approach is, if a single row is updated multiple times, which row will that be? A booking made 10 years ago? Very unlikely. It will be a recent one, one that is still in the uncompressed area perhaps?
- Updates and deletes do not impact the compressed data
- Compression is done in batches, not on individual rows, making it more efficient
- Updates of the same row in the uncompressed area grows this space but when compressing, only the most recent version is taken
- Thus the insert-only approach does not grow the table as much
- Compression requires CPU power
- Updating rows that are in the compressed area causes the table to grow still
- Having an uncompressed and compressed area for each table has tradeoffs
As the data is inserted at the end only, each operation has an unique row number within the table. Initially the table was empty and then a first row is inserted, it is the row #1. The next row has #2. An update of the first row means appending a new row with row number #3.
The main argument against column wise storage was that reading an entire row is more expensive as multiple columns need to be read now. But is that true?
What is faster in this example?
|Column A||Column B||Column C|
Reading the cells (3,A) + (3,B) + (3,C) or reading the cells (A,3) + (B,3) + (C,3)? Putting it this way, the answer is obviously Neither. It does not matter.
So where did the assumption “row orientation is better for reading an entire row” come from? Because of how close the data is. That is a valid point if the data is stored on disk. Reading one line in a file means positioning the disk-head at that location and then the entire line is read at once. This is much more effective than reading three lines with one column each. That is true even for SSDs as data is organized in 4k pages there. On top of that, disk I/O works with 512Byte blocks internally. All favoring horizontal storage over vertical storage.
For memory access that does not apply. A microprocessor sends an address to the DRAM and gets back the data under this address. If the next read is nearby or a completely different address is (almost) irrelevant. If, and only if, the calculation of the address itself is straight forward.
Here the insert-only helps as the addresses of cells are easy to calculate – take Column A and access the third value, then Column B and value at position three and finally Column C. For memory access that is the same as accessing Column A and then the first, second and third value.
It does not make a difference. None at all. The entire argument of reading row-wise is better is based on the assumption that it is faster to read horizontally than vertically. Which is true, if the data is stored on disk in a horizontal fashion. Then the data of one row is closely together and hence read in one rush from the disk cylinder. But for reading from memory it does not matter at all.
- Reading from a few columns all rows is fast
- Reading all columns of a single row is fast as well
- In fact every operation is fast
- Insert-only causes the table to grow
- How is the memory address calculated if values are of different length?
That’s an easy one. With compression less memory is need. Therefore larger databases can fit into smaller servers.
Another aspect is the type of compression. Above, a few aspects of the compression have been shown. Compression needs to be effective but consume little additional CPU overhead. The compression has to support easy calculation of the memory address given a row number.
How about a dictionary compression as one compression type? One area stores all the unique values that actually occur in the data of a column and this list is indexed. For the MTART column of the MARA table that would be (1, ROH), (2, HALB), (3, HAWA), (4, HIBE), (5, FHMI), (6, DIEN), ….
Now the actual column has to store the index only. Instead of 20’000 rows with four characters each, the index number with one byte (hopefully) is stored. For cases where there are longer strings, an even distribution of values and few distinct values, that make total sense.
This compression can be enhanced in case the values are clustered, like MANDT being a perfect candidate, it has the value 800 everywhere. So all that needs to be stored is “First row has the value 800 and so do the next 20’000 rows”.
Another approach is to have one bitmap for each distinct value.
All of these strings can be compressed very efficiently given how many zeros they contain. In the most simple case such compression would be ROH=1*1 followed by 6*0. With this method calculating the address offset of a given row are very efficient CPU operations: AND operations on bytes. Further more, filtering records is even more efficient and do not require an explicit index! select * from MARA where MTART = ‘ROH’? Take the Bitmap of the ROH and read the rows with the number as shown by the position of a 1 in the bitmap. So row number #1 in this table is the only result.
There are various compression methods implemented for Hana to pick the best suited for each column. For primary keys, for free form string columns, for columns which have few distinct values, for columns that differ just a bit,… all have suited compression algorithms.
Yes, this can be more complex to implement, but these algorithms are tailored to what CPUs can do best.
- Less memory needed
- Fast compression and decompression
- Fast scans, so fast that no separate index is needed except for primary keys
- Compression requires some CPU power still
- Committed data needs to be persisted on disk still somehow, else all data is gone when the server is shut down
When combining all four technologies together, almost all downsides are eliminated.
- Compression: Turned from an overhead into an efficient way to read and write data
- Columnar Storage: The way it is implemented it is optimal for all combinations of queries, few columns – many rows, many columns – single row, everything
- Insert only: Straight forward approach to support read consistency in its most strict version without side-assumptions like optimistic locking
- In-Memory: Even large databases fit into memory
However three problems remain:
- Persisting data on disk is a requirement for not losing any committed data
- Merging the compressed row version with the uncompressed versions to limit the growth of tables
- Hardware costs in case the database is too large
Solving point 1) is a common technique: Every change is stored in a transaction log file on disk. All databases do that and it makes sense. If there is a power failure and the data is persisted to disk at commit time, nothing can happen. After the restart all committed transactions can be read from there and the memory structures can be rebuilt. Performance is not a problem either as the disk is written in a sequential manner where it is very fast – both hard disk and SSDs.
Granted, we don’t want to store the transaction logs forever and rebuild the database from the point in time it was installed, therefore the compressed area of the tables are stored once a while to disk as a starting point and the transaction log is needed from just that point in time onward. If the compressed area of the table does not even change – a fair assumption for many tables – it does not need to be dumped on disk again. Example would be a finance table partitioned by year. Very unlikely somebody does change rows in previous years. Or all the many small helper-tables that never change, come to mind…
The insert-only (2) is solved only partially per the information provided up to here. The uncompressed area does collapse all versions into the most recent one during the compression we said but that does not prevent having multiple versions of the same row in the compressed area. Given how the compression algorithms work, nothing prevents us from merging those rows once a while also. This is a bit more work, yes, but still better than uncompressing and re-compressing the whole data at every single insert/update/delete. And by choosing a proper partitioning method, hopefully not the entire table has to be worked on but only the most recent partition.
That this batch processing is taking its toll is obvious. This truly is unavoidable. As a result the administrator will see lots of CPUs being idle most of the time and periodically more CPU power is used – whenever the delta merge for a large table kicks in.
Leaves the question of what to do when the database is too large…
Scale-out and Dynamic Tiering
For large databases, one way out could be to ask “How big is your ERP or Data Warehouse database?” Are there really so many customers whose database, assuming the compression factor of 10 holds true and we assume 4TB memory is affordable and a 50% memory reserve makes sense, is larger than 20TB? A weak argument, I have to admit.
Then there are some ways to push the limit a bit
- Scale-out: Instead of one large server, use two smaller ones. This can bring down costs but does not scale forever. It gets us past twice the database size, three times the database size.
- Dynamic Tiering: Keep the Hot data in-memory, the warm data is put on disk. As seen from the Hana database user, all looks like a single table still. The only difference is that querying the warm data runs at disk database performance, not Hana performance. Might be another option to keep the costs down for larger databases.
- Combination of the two.
If the volumes are even larger than that, I would ask myself if that really is a database use-case still and not a Big Data scenario. Are the database guarantees like transactional guarantees, strict read consistency, concurrency,… all needed? If in doubt – read this for more information. In such scenario the Data Lake utilizing Big Data technologies is used to store the raw data with the advantages and disadvantages of cluster processing and Hana would be the database containing the interesting facts. This provides the cheap storage and processing of enormous data volumes as well as the response times and join performance of SAP HANA. Using SAP HANA options like Smart Data Access can be used to hide the differences.
Other common misperceptions
Running a classic database with all database blocks being cached in RAM is not an in-memory database. Agreed? The above explanations hopefully showed that. Yes, of course reading the data from cache is faster than from disk. But then disk access patterns (focus on sequential access, put data in one 512Byte sector) are replicated for memory access where those are irrelevant. In best case, this gives away performance optimizations for no reason, and in reality these assumptions do cost extra performance.
One of the recent press releases did spike my interest as well: “…Terracotta DB, a new generation, distributed, In-Memory Database platform…“. A database has certain qualities like aforementioned transactional guarantees, locking, read consistency. To call something a database that does not even support SQL is …. bold. Yes, SAP HANA is a RDBMS with every quality you expect from such.
I agree, however, with the statement that nothing is new in SAP HANA. Columnar storage was not invented for SAP HANA, in fact I used such tools some 20 years ago. Same with insert-only, a technique the entire Big Data world is built on – see immutable files. What makes SAP HANA unique, is the combination of these techniques to form a database. I would like to encourage everybody to check the foot notes of all other database vendors, even the most recognized ones, in terms of the way in-memory is supported. You will find statements like “Trade-offs of In-Memory features”, “Optimistic locking” etc.
The reason is simply because you need to combine the power all four technologies together, and in a clever way, to achieve all the benefits. Since these technologies impact the heart of how a database works, they cannot be added to existing databases but rather the database needs to be built from the ground up.
Learn more about all the capabilities of the SAP HANA platform.