Working with SAP HANA One on AWS can be quite exhilarating. With the power of 32 virtual processors and 64 GB of RAM for running HANA, it’s hard to walk away from this speed demon environment. The problem that I now have is I’ve got 1 TB of log files that I need to process on occasion. With only 200GB of disk space and 64GB of RAM on the HANA One instance, I have more data that SAP HANA One can handle as “hot data”.
Hence – I’m having a Katy Perry moment.
I learned a lesson a few months back – you don’t want to exceed half the system RAM when uploading data to ROW based tables. If you recall, ROW tables need to fit everything into memory when the system boots and it was possible back in the .38 release to import a little too much data to a point where you’ll crash the server. Note to self and team – make sure to pay attention to the HANA Studio dashboard – critical warnings on memory. 🙂 What happens in this scenario is that HANA loads the data from disk and builds the indexes on the fly when the server boots – well – it won’t. 😯
Given that it’s not possible to setup a cluster with SAP HANA One on AWS to scale out, you need to think about what data needs to be hot and what data should stay cold. Enter Hadoop. With Hadoop, I can setup a Hadoop cluster on AWS for my CSV files. I then have the option of running a map-reduce job/Pig program, or Hive query to grovel over the files to return the data. Another option is to import subsets at a time temporarily into SAP HANA for pure speed. When I’m done, I simply DELETE the data, TRUNCATE the table, or DROP the table altogether. With COLUMN tables, I need to be remind myself that when deleting data, that records aren’t actually deleted, but marked for deletion. In this case, DROP table works best.
I now have the flexibility to run with “Hot data” inside of the HANA database, or process the “Cold data” with Hadoop. In Part 2, I’ll walk through the examples and considerations for importing data directly into HANA from the CSV files and then flushing the data from the system. In Part 3, I’ll show how to grovel over data with map-reduce and load in the results to HANA for processing.