Loading large CSV files to SAP HANA

Former Member · ‎04-10-2014

SAP HANA makes it very easy to import a CSV file to a database. There are multiple tutorials available on HANA Academy that detail process of how to import CSV file to a table in HANA database.

Though process of importing CSV data to HANA is very simple, it can be a challenge to import a large CSV file to HANA database. Large CSV file can't be imported using HANA Studio because import process is very slow. Loading large CSV files to HANA is normally accomplished using either a control file or data is directly imported using a CSV file.

IMPORT FROM CONTROL FILE '/data/data.ctl';

where contents of /data/data.ctl file are

IMPORT DATA INTO TABLE "MYTABLE" FROM '/data/data.csv'

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ',';

or

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ',';

Both of these methods are similar and all techniques described in this article apply for both methods.

1. Download/copy CSV file to appropriate directory on HANA Appliance. This allows use of powerful HANA appliance for processing CSV file. If you get "Cannot open CSV file" error during CSV import and your path is correct, ensure that UNIX user hdbadm has read permissions on directory. If user doesn't have read permissions, change permissions on directory where CSV and CTL file will reside.

2. Create sample data set from large data file and create a target table based on sample data. To create a sample data set of 1000 records, use

head -1000 data.csv > sampledatafile.csv

Copy sampledatafile.csv to HANA Studio machine and import sampledatafile.csv using HANA Studio. This allows easy creation of target table and column definitions can easily be adjusted. If definitions for columns are available, then edit table definition in HANA Studio. Alternatively, use ALTER TABLE commands to adjust table definitions once initial column definition has been created.

3. For importing large number of rows use batch clause in IMPORT command to ensure faster performance.

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

BATCH 1000

4. Always include an error file in import command and check error file after import to ensure no errors are generated. Ensure that UNIX user hdbadm has write permissions available on directory where error file is being written, otherwise no error file will be generated.

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

ERROR LOG /log/error.log

By default generated error file is read only. If script is executed for a 2nd time, with same error file name, file will not be refreshed. Delete log file after each execution.

5. If CSV load should fail if it encounters any errors, use "FAIL ON INVALID DATA". Load will fail as soon as it encounters first error.

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

FAIL ON INVALID DATA

6. Before starting import, ensure enough memory is available on HANA appliance and set AUTOCOMMIT property to false. In HANA Studio, click on right click SQL Editor and click "Show In properties". Then set AUTOCOMMIT to false.

7. While importing data, ensure that column widths are set correctly for target table. If definitions for input data are not available, first import data to a staging table with large column sizes. CSV import will fail with error "inserted value is too large for column" if any column doesn't have sufficient width. During CSV import, rows with any columns with too large value will be rejected.

8. Manually inspect sample of CSV file and check data format. If there are any dates in CSV file, verify date column format. By default HANA expects dates to be in YYYY-MM-DD format. If dates are in any other format, use DATE FORMAT in import clause to specify a different date format.

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

DATE FORMAT 'MM/DD/YYYY'

See this link for more details on DATE FORMAT.

9. Ensure that data doesn't have any special characters " included in data set. If special characters are part of data set, clean input data using sed scripts.

e.g. Description field may contain data as "Length of rod: 56" and further description". . In this case " after 56 needs to be cleaned before import function will work.

10. After CSV import, check table runtime properties using HANA Studio and ensure "Number of entries" looks correct. Verify size for main memory and delta memory. Perform delta merge to move data to main memory and reduce memory usage.

Any other suggestions are welcome.

Links for further reading.

Loading large CSV files to SAP HANA

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z