Skip to Content
Technical Articles

Import data from files to SAP HANA Cloud

In this blog I will introduce how to import data from flat files to SAP HANA Cloud.
There are several methods you can use, depending on your requirements and the data size.

Overall, I would recommend to use DBeaver to manually import CSV files up to ~1GB.
Use the “IMPORT FROM” SQL statement to import larger files from AWS or Azure file storage.
If you need to import files other than CSV or to pre-process your data, Python is the most versatile tool.
If you need to import many files based on a pattern and/or at repeated intervals, use the SAP HANA Smart Data Integration File adapter.

1. hdbtabledata

As said in this blog, when the import of the data is connected to modelling tasks in SAP Web IDE Full-Stack or SAP Business Application Studio, you can insert test small files up to 10MB directly in your database projects. Place files you want to import directly into the design-time definition of the HDI container and define the import via an .hdbtabledata HDI artefact.

This approach only supports .csv files.

Keep attention when working with Git. Most Git repositories have size limits. It is not a good practice to check in large files in a Git repository, nor to put it into a project in SAP Web IDE.

Find .hdbtabledata loading samples on Github.

2. SAP HANA Database Explorer

You can use the data import function of the SAP HANA Database Explorer to import CSV files up to 200MB from your local PC. From this UI, you can also import CSV / PARQUET files from AWS S3 and Azure.

Access the database explorer from your SAP HANA Cloud instance on the SAP Cloud Platform Cockpit.

Within the Database explorer, you can access all your HDI containers. Right click on the database/HDI container in which you want to import data, and select “Import Data”.

The “Import Data” wizard will open up. You can select the CSV file, then your import target table.

If you want to import files directly from Amazon S3 or Azure, you need to register the IaaS certificate into SAP HANA Cloud first. For details, refer to section ‘5. “IMPORT FROM” SQL statement’ of this blog.

If you choose Amazon S3 as the import source in the UI, you need to set the S3 region and the S3 path, made of your IAM user access key and secret key, as well as your bucket name and object ID. Just enter them as requested by the UI.

Select the table where you want to import data.

In the import options, you can choose the file format (CSV, PARQUET), you can choose whether to use the first row as column names, delimiters and date formats.

Set the error handling, and you can start importing your file !

3. DBeaver

If you use SAP HANA Cloud as a datawarehouse, I definitiely recommed DBeaver to manage your data. DBeaver has a very intuitive data import function which lets users import data from CSV files.

First, establish a secure connection from DBeaver to SAP HANA Cloud. Then, right-click on the schema where you want to import data and select “Import data”.

Follow the wizard to choose the data to import.

In this example I use a 100MB sample dataset from Kaggle with 1019925 records.

Set the importer settings.

You can manually set the table mapping, or leave it as default. The importer is able to create automatically a new table in SAP HANA Cloud for you.

These are the default columns created for my CSV file.

In the data transfer option, you can decide how often the imported data should be committed. This has an impact on the import time for large files.

Confirm that everything is correct, then start the import.

I was able to import a 100MB CSV file from my client to SAP HANA Cloud in 4 minutes with DBeaver.
As a reference, importing that same file to a SAP HANA 2.0 SPS05 instance located on my local network takes 12 seconds with DBeaver.

4. Python

If you need to pre-process data before inserting it into SAP HANA Cloud, Python offers a powerful and flexible environment.

I used the same 100MB CSV file as in the DBeaver example to compare import performance.
I started by creating a new Jupyter notebook in the directory where my CSV file is located.

In the notebook, start by importing the hana_ml library,
establish the connection to SAP HANA Cloud,
read your CSV file with pandas,
create a dataframe in the SAP HANA Cloud database containing your data,
and finally close the connection.

Here is the python code I used. The HANA ML dataframe allows developers to use data in HANA as a pandas dataframe. This means you can prepare the data to fit your needs. Learn more about an actual use case in this blog by Andreas.

!pip install hana_ml

import hana_ml
print(hana_ml.__version__)

import hana_ml.dataframe as dataframe

# Instantiate connection object
conn = dataframe.ConnectionContext(address = 'hostname.hanacloud.ondemand.com',
                                   port = 443, 
                                   user = 'user', 
                                   password = 'password',
                                   encrypt = 'true'
                                  )

# Send basic SELECT statement and display the result
sql = 'SELECT 12345 FROM DUMMY'
df_remote = conn.sql(sql)
print(df_remote.collect())

import pandas as pd
df_data = pd.read_csv('NYCFT.csv', sep = ',')
df_data.head(5)

df_remote = dataframe.create_dataframe_from_pandas(connection_context = conn, 
                                                   pandas_df = df_data, 
                                                   table_name = 'NYCFT_PYTHON',
                                                   force = True,
                                                   replace = False)

conn.close()

The data import finished in 9 minutes. This confirms the results of Lars Breddermann that DBeaver is around twice as fast as Python regarding data import to SAP HANA Cloud.

I used the hana_ml library to import data the simplest way I could, but you can also use other libraries such as SQL Alchemy.

5. “IMPORT FROM” SQL statement

Finally, if you have a very large dataset, the fastest way to import data to HANA Cloud is to use the built-in “IMPORT FROM” SQL statement.

Follow the documentation to enable data import directly from your IaaS platform. For this guide, I used AWS.

First, you will need an AWS account, with S3 enabled and your files need to be uploaded to S3. I used the same file as in the DBeaver and Python examples to compare performance.

In the IAM section, create a user with API access. You will receive an Access key ID and Secret access key.

Grant the necessary access permissions to your IAM user. I granted the AmazonS3ReadOnlyAccess policy to my user, which gives access to all files on S3 for my account. You can also give a more granular access if necessary.

Now you must register the certificates necessary for the SSL connection to S3.

REST API-based adapters communicate with the endpoint through an HTTP client. To ensure secure communication, the adapters require client certificates from the endpoint services. Connections to an SAP HANA Cloud instance, which are based on the hanaodbc adapter, also require an SSL certificate.

Here are the two necessary certificates to connect to AWS S3 :
Amazon Root CA
Digicert Baltimore CyberTrust Root CA (download PEM):

Copy each root certificate text and create a certificate from this text which can be saved in a personal security environment (PSE).
Execute these SQL statements towards your SAP HANA Cloud database :

  • Create a PSE.
create pse HTTPS;
  • Create a certificate.
CREATE CERTIFICATE FROM '
-----BEGIN CERTIFICATE-----
... 
-----END CERTIFICATE-----
' COMMENT <comment>; 

The created certificate is now contained in the CERTIFICATES system view and can be added to a PSE store.

  • Add the certificate created above to the PSE:
SELECT CERTIFICATE_ID FROM CERTIFICATES WHERE COMMENT = '<comment>'; 
ALTER PSE <pse_name> ADD CERTIFICATE <certificate_id>;
  • Set the purpose of the newly created PSE to REMOTE SOURCE:
SET PSE <pse_name> PURPOSE REMOTE SOURCE;

This makes the usage of the certificates in the PSE store available to all remote sources.

Now that the Personal Security Environments are set up, you can import data from AWS S3.
IMPORT FROM CSV FILE 's3-<region>://<access_key>:<secret_key>@<bucket_name>/<object_id>.csv' INTO <TARGET_TABLE_NAME> [with options];

For my 100MB CSV file, the command took 34 seconds to execute.

 

There are other methods which I did not explore to import data to SAP HANA Cloud, such as SAP HANA Smart Data Integration File adapter, hdbsql, or the JDBC/ODBC drivers.

The SAP HANA SDI File adapter is particularly useful for these use cases :

  • SharePoint access
  • SharePoint on Office365
  • Pattern-based reading; reading multiple flies in a directory that match a user-defined partition
  • Real-time file replication (only APPEND)

In conclusion, each file import has its merits depending on your use case. Find the method which fits your needs.

Thank you for reading,

Maxime SIMON

Special thanks to Daisuke Ikeguchi for his support in writing this blog.

5 Comments
You must be Logged on to comment or reply to a post.
  • Hi Maxime,

    Do you know if there’s a way to use the IMPORT/EXPORT SQL statements with HDI containers? Every user I tried with (DBADMIN, HDI container admin, HDI container group admin etc.) don’t have the EXPORT privilege and I can’t find a way to grant them this privilege.

    Thanks,

    Pierre

    • You can find the necessary permissions on the SAP HANA Cloud SQL Reference :

      Permissions

      You must have the IMPORT system privilege, and the CREATE ANY privilege on the schema that is being imported in.

      So you must find the schema where information is stored within your HDI container, and grant the CREATE ANY privilege to the user executing IMPORT. This user must also have the IMPORT system privilege.

      On the example below, I created an HDI container called hanacloud. All the schemas below get created with that HDI Container. HANACLOUD_1 is the main schema so I think you need the CREATE ANY privilege on that one, however maybe the privilege is also required for the other technical schemas. Could you test and let us know ?

      Assign privileges to your user in HANA Cockpit

      /