Back in October, I was attempting to load a series of files used for testing HANA on AWS after reading Mural’s post on using HANA Studio to import files – see http://scn.sap.com/docs/DOC-26381. The first problem I encountered that the pipe “|” symbol was not available in the list of options for the v38 of HANA Studio listed at http://scn.sap.com/community/developer-center/hana. No problem, I’ll just load the file into Excel and then save the result out as a CSV file using the semi colon delimiter on one of my smaller files (25 records). HANA Studio worked like a champ, so I proceeded to do the same with the rest of my pipe delimited files – totaling 1 GB of space. All looked fine until I tried importing in the first of the larger tables with 131,000 records. HANA Studio threw an error indicating that Java was out of heap space. Given that I’m not the only one hitting this problem, I thought I would share how our team got around the problem after reading the thread from Adebiyi Adelufosi today .
First, you need to learn a little about the IMPORT FROM command at http://help.sap.com/hana/html/sql_import_from.html. You’ll see that you can import flat files with the aid of a control file (.ctl) that defines the parameters for the import. Now, I just noticed that the command has a CSV FILE clause that I somehow missed that would have worked 😯 . The syntax looks like this:
IMPORT FROM CSV FILE ‘/data/data.csv’ INTO “MYTABLE” WITH RECORD DELIMITED BY ‘\n’ FIELD DELIMITED BY ‘,’;
So, now the question is – how do you get the CSV file onto your AWS HANA database instance. The wiki post at http://wiki.sdn.sap.com/wiki/display/inmemory/Importing+CSV+files+into+SAP+HANA eludes to using SFTP. Well, there is an SFTP plug-in when you launch the console from AWS as shown below.
After Launching the SSH client with your .pem file, you’ll want to navigate to the Plugin’s menu for the client as show below.
This brings up a dialog that allows you to copy files from your desktop server instance to your HANA filesystem on Linux. Now, in the image that I’ll show below, we used the HANA backup directory as the destination – this is before we learned the directory structure for a HANA instance. Since then, we’ve created a separate disk as a landing zone for our flat files, but that’s a topic for a different post. As you can see below, some of the files are fairly large.
If you use the IMPORT FROM CSV FILE, you don’t need the .ctl file.
Once you get your CSV files into your “landing zone”, you are now ready in HANA studio to perform the import command. For example, to load the customer.tbl file, For example:
IMPORT FROM CSV FILE ‘sap/usr/HDB/HDB00/backup/data/customer.ctl’ INTO “customer”
WITH RECORD DELIMITED BY ‘\n’ FIELD DELIMITED BY ‘|’;
Keep in mind, we created the tables before loading the data. Another point is – delete your flat files from your HANA instance after the import – otherwise – you could run out of disk space. The developer HANA instance on AWS only has about 120GB of disk space allocated. Otherwise, you might run into the out of disk error and your instance could fail to start! See my post http://scn.sap.com/community/developer-center/hana/blog/2012/12/03/handling-the-disk-full-error–or-why-column-tables-can-save-serious-disk-space on how to recover. Finally – path names are case sensitive – this is foreign to us Windows folks. 😉
I’d like to acknowledge my co-worker Ajay Mahala for helping me put together this post.
If you found this post useful, please rate it and like it. You can also follow me in twitter at http://twitter.com/billramo.