Skip to Content

Working with SAP HANA One on AWS can be quite exhilarating. With the power of 32 virtual processors and 64 GB of RAM for running HANA, it’s hard to walk away from this speed demon environment. The problem that I now have is I’ve got 1 TB of log files that I need to process on occasion. With only 200GB of disk space and 64GB of RAM on the HANA One instance, I have more data that SAP HANA One can handle as “hot data”.

Hence – I’m having a Katy Perry moment.

I learned a lesson a few months back – you don’t want to exceed half the system RAM when uploading data to ROW based tables. If you recall, ROW tables need to fit everything into memory when the system boots and it was possible back in the .38 release to import a little too much data to a point where you’ll crash the server. Note to self and team – make sure to pay attention to the HANA Studio dashboard – critical warnings on memory. 🙂 What happens in this scenario is that HANA loads the data from disk and builds the indexes on the fly when the server boots – well – it won’t. 😯

Given that it’s not possible to setup a cluster with SAP HANA One on AWS to scale out, you need to think about what data needs to be hot and what data should stay cold. Enter Hadoop. With Hadoop, I can setup a Hadoop cluster on AWS for my CSV files. I then have the option of running a map-reduce job/Pig program, or Hive query to grovel over the files to return the data. Another option is to import subsets at a time temporarily into SAP HANA for pure speed. When I’m done, I simply DELETE the data, TRUNCATE the table, or DROP the table altogether. With COLUMN tables, I need to be remind myself that when deleting data, that records aren’t actually deleted, but marked for deletion. In this case, DROP table works best.

I now have the flexibility to run with “Hot data” inside of the HANA database, or process the “Cold data” with Hadoop. In Part 2, I’ll walk through the examples and considerations for importing data directly into HANA from the CSV files and then flushing the data from the system. In Part 3, I’ll show how to grovel over data with map-reduce and load in the results to HANA for processing.

Regards,

Bill

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. Lars Breddemann

    Hi Bill,

    it’s correct, that DELETE does not immediately remove the deleted data but marks it as DELETED (just like e.g. Oracle does it as well).

    However, this is true until the next delta merge operation. If at that point no open transaction references the deleted row any more, it won’t be part of the merged table structures.

    So, after a bulk delete you may just want to trigger a MERGE command against the table to reclaim the space (or you just wait until AUTOMERGE does it’s job 🙂 ).

    – Lars

    (0) 

Leave a Reply