Unleashing Lightening: Building a quarter trillion rows in SAP HANA
For the Keynote at SAP TechEd/d-code 2014, we built out a quarter trillion row model in a single scale-up HANA system. You can read the high level Unleashing Lightening with SAP HANA overview and watch the video.
I thought that people might be interested in how the demo was built, and the logistical and technical challenges of loading such a large data model.
Building the SGI UV300H
The first challenge we had was finding a system big enough for 30TB of flat files in short time. The SGI UV300H is a scale-up HANA appliance, made up from 4-socket building blocks. The SGI folks therefore had to string 8 of these together using their NUMAlink connectors and attach 6 NetApp direct attached storage arrays for a total of 30TB of storage.
Today, only 4- and 8-socket SGI systems are certified and the 32-socket system is undergoing certification. The nature of the UV300H means that there is non-uniform data access speed. On a 4-socket Intel system you have either local (100ns) or remote (300ns) – you can read Memory Latencies on Intel® Xeon® Processor E5-4600 and E7-4800 product families for more details.
With the NUMAlink system there is also a hop via NUMAlink to the remote chassis, which increases the memory latency to 500ns. Whilst that is blindingly fast by any standard, it increases the non-uniformity of RAM access on HANA. For SAP HANA SPS09, SAP optimized HANA for the UV300H by improving average memory locality.
However HANA SPS09 wasn’t available, so we ran on stock SAP HANA SPS08 Revision 83. It’s tough to say how big a penalty this cost us, but on a theoretical 7.5bn aggregations/sec, we got closer to 5bn, so I’m guessing SPS09 would provide a 25-50% hike in performance under certain circumstances.
But to be clear, this is the same HANA software that you run on any HANA server, like AWS Developer Edition. There was no customization involved.
Downloading 30TB of flat files
Here was our next challenge. I did the math, and realized this was going to take longer than the time available, so I put a call into Verizon FIOS to see if they could help. They came out the next day and installed a new fiberoptic endpoint which could service up 300/300Mbit internet. With my laptop hard-wired into the router, we could get a constant 30MByte/sec download from the Your.Org Wikimedia Mirror. Thanks guys!
Once these were on USB hard disks, we shipped them to SGI Labs, which cost another 4 days, due to the Columbus Day holiday.
From there, we found we could load into HANA faster than we could copy the files onto the server (USB 2.0).
Building the HANA Model
Thankfully, I have a few smaller HANA systems in my labs, so I tested the configuration on a 4S/512GB system with 22bn rows, and on a scale-out 4x4S/512GB system with 100bn rows. There were a few things that we found that would later be of value.
First, partitioning by time (month) is useful, because you can load a month at a time, and let HANA merge and compress the last month whilst you load the next month. This saves the constant re-merging that happens if you don’t partition by time. A secondary partition by title is useful, because it ensures partition pruning during data access, which means that much less RAM is scanned for a specific query. This led to a RANGE(MONTH), HASH(TITLE) two-level partition strategy, which is very typical of data of this type.
Second, the amount of data we had meant that it was going to be most practical to logically partition the data into tables by year. This wasn’t strictly necessary, but it meant that if something went wrong with one table, it wouldn’t require a full load. This decision was vindicated because user error meant I wiped out one table the night before the Keynote, and it was easily possible to reload that year.
Third, a secondary index on TITLE was used. This was based on research by Lars Breddemann and Werner Steyn Further Playing with SAP HANA which led us to understand that when a small amount of data is selected from a large table, an index on the filter predicate column is beneficial. Therefore if the SQL query is SELECT DATE, SUM(PAGEVIEWS) FROM PAGECOUNTS WHERE TITLE = ‘Ebola’ GROUP BY DATE, then a secondary index on TITLE will increase performance.
Fourth, we built a simple HANA model to UNION back in all the tables in a Calculation View, and join it to the M_TIME_DIMENSION system table so we could get efficient time aggregation in OData and ensure query pruning.
Optimizing SAP HANA for the UV300H
By this time, we had 30TB of flat files on the /hana/shared folder of the UV300H and had to get them loaded. We realized there was a challenge, which is the Wikipedia files come in space delimited, with no quotes around text, and the date is in the filename, not a column. We didn’t have Data Services or another ETL product, and the fastest way to get data into HANA is using the bulk loader.
So, I wrote a script which uncompressed the file into a memory pipe, reformatted it in awk to contain the timestamp and convert it to CSV with quotes, write it out to a RAMdisk, run it into the bulk loader and delete the RAMdisk file. Each hour takes around 20 seconds to process, and I ran 12 threads, in parallel, plus an additional 40 threads for the bulk loader process.
What we realized at this point was that SAP HANA SPS08 wasn’t optimized for the amount of power that the UV300H had, so I tweaked the settings to be more aggressive, particularly with mergedog, which only uses 2 CPU cores by default. We enabled the integrated statistics server, installed PAL and the script server.
In addition, I found that it was necessary not to be too aggressive, because the log volume is only 500GB, and you can easily fill this up between 5 minute savepoints if you get too aggressive with loading (remember you have to buffer enough logs until the savepoint is complete). I suspect the certified 32-socket system will have a 1 or 2TB log volume for this reason.
Other than that, we pretty much found that it just worked. Here’s a screenshot of using all the 960 vCores in the UV300H during some early query testing. I’m sure glad I didn’t pay for the power bill!
Building the Web App in HANA XS
We put together the Web App in HANA XS using SAP UI5 controls and OData services to access the underlying data model. More on this in a later blog when Brenton O’Callaghan is going to describe how it was built.
What’s critical about this is that the OData services which is accessed directly by the browser runs in-memory, and has access directly to the Calculation Scenario which is generated by the SAP HANA Calculation View. This means that the response time in the browser is very little more than a SQL query ran in a console on the server itself.
There were really no special considerations required to use HANA XS with a model of this size – it worked exactly the same as for any other HANA model. One thing we did to ensure we didn’t cause problems was to restrict the HANA models so you couldn’t return very large data volumes by using Input Parameters. This means you can’t return 250bn rows in a browser!
I’ve said this in the other blog, but whilst there were huge logistical challenges in building a model like this in 10 days, HANA made it possible. The fact that HANA self-optimizes the whole model for compression and query performance and requires no tuning is a huge benefit. Once we had built a simple data model, we were able to directly load all the data overnight.
One thing that was worth noting is because of the direct attached storage model in the UV300H, we found we can load around 200GB/minute into RAM (once it has been loaded and merged once). That means we can load the entire 6TB model on this system in around 30 minutes, which is the fastest load speed I’ve ever seen on a HANA system.
Anyhow, the purpose of this blog was to open the kimono on specifically how we built this demo, and to show that there was no special optimization to do so. This, despite the fact that the UV300H 32-socket edition certification is still in progress and the HANA optimizations for it weren’t available to us. If you have any questions on it then please go ahead and ask them, I’d be very happy to answer.
And remember if you’d like to Build your own Wikipedia Keynote Part 1 – Build and load data then please follow that series – we’ll be building out the whole demo in 4 parts.