Using SAP HANA Cloud, data lake files with SAP HANA Cloud, SAP HANA database
With the 2022 Q1 release of SAP HANA Cloud, the SAP HANA Database (HANA DB) features included the ability to read and write from SAP HANA Cloud, data lake files (HDLFS).
The manuals contain a lot of valuable information, but it is spread between both product sets. Use this manual as a primer: https://help.sap.com/viewer/f9c5015e72e04fffa14d7d4f7267d897/2022_1_QRC/en-US/462c861413b043bd93b9e8e838249b6e.html
The purpose of this blog is to bring that information together in a single location with a step by step process to work with both HANA DB, HDLFS, and HDLRE.
Just to level set on terminology:
- HANA DB — SAP HANA Cloud, SAP HANA database
- HDLRE — SAP HANA Cloud, data lake relational engine
- HDLFS — SAP HANA Cloud, data lake files
Quite a lot of topics can be found in other areas. To consolidate that list:
- HDLFS Command Line Interface
- Setting Up Initial Access to HDLFS — This must be done so that external access to the HDLFS container can be granted.
- SAP HANA Cloud, data lake relational engine loading data
Before beginning the steps below, make sure to Set Up Initial Access to HDLFS. In order to set up the HANA security to HDLFS, HDLFS must be configured for trusted, external access. In order to complete some of the later sections that use the HDLFS command line interface, you must first install the SAP HANA Cloud, data lake client on your platform.
Once that is done, the HANA DB process will include include:
- Create PSE (SAP HANA Private Security Environment)
- Add the Digicert certificate to PSE
- Add the local credentials to the PSE
- Add the HDLFS certificate to the PSE
- Create a HANA Import/Export Credential that maps to the PSE (and by extension the certificates)
- Test things out!
Create PSE (SAP HANA Private Security Environment)
drop PSE MYPSE cascade; Create PSE MYPSE;
Add Root DigiCert Certificate to PSE
Once you have created a HANA DB PSE, you must then upload the root DigiCert certificates.
- Must use DigiCert certificate
- Download PEM version of the DigiCert Global Root CA from this site: https://www.digicert.com/kb/digicert-root-certificates.htm
- Thie file: https://cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem
- This CREATE CERT should work for all HANA clouds as this is the above PEM file.
CREATE CERTIFICATE FROM '<Digicert PEM Certificate>' COMMENT 'HDLFS';
Once created, you need to get the certificate ID as that’s used to reference it:
SELECT CERTIFICATE_ID FROM CERTIFICATES WHERE COMMENT = 'HDLFS';
Add The Root Certificate to the PSE
ALTER PSE MYPSE ADD CERTIFICATE <certID_from_above>;
Add the HDLFS Certificates to the PSE
For the purposes of this blog, we will use a self-signed set of certificates generated using the OpenSSL utilities available on most platforms. Alternatively, these files can be given to you by a security administrator.
Follow this process to generate the files using OpenSSL on any host with OpenSSL installed. One thing of note in these commands is the “-days” parameter. This is the life of the certificates. If you set it too low, you will have to constantly regenerate this data. If you set it too high, then it may violate your corporate standards. For the purposes of illustration and use in demo systems, I have chosen to use 9999 days as the life of these keys.
openssl genrsa -out ca.key 2048 openssl req -x509 -new -key ca.key -days 9999 -out ca.crt openssl req -new -nodes -newkey rsa:2048 -out client.csr -keyout client.key openssl x509 -days 9999 -req -in client.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out client.crt openssl verify -CAfile ca.crt client.crt openssl x509 -in client.crt -in client.crt -nameopt RFC2253 -subject -noout
Once generate, you need to add these certificates to your HANA PSE. There are three files, and thus three sections that must be included between the single quotes below:
ALTER PSE MYPSE SET OWN CERTIFICATE '<files here>';
For <files here> use the full contents of these files (including the BEGIN/END comments) that were generated with OpenSSL:
Create a HANA Import/Export Credential that maps to the PSE (and by extension the certificates
CREATE CREDENTIAL FOR USER DBADMIN COMPONENT 'SAPHANAIMPORTEXPORT' PURPOSE 'myCredential' TYPE 'X509' PSE 'MYPSE';
create table EXPORT_TEST as ( select * from dummy ); -- add more data, as little or as much as you wish insert into EXPORT_TEST select * from EXPORT_TEST; insert into EXPORT_TEST select * from EXPORT_TEST; export EXPORT_TEST as parquet into 'hdlfs://<HDLFS endpoint>/hdb_export_test' WITH CREDENTIAL 'myCredential' ;
See Exported Files in HDLFS Using hdlfscli
Using the SAP HANA Cloud, data lake files command line interface (hdlfscli) as part of the SAP HANA Cloud, data lake client kit, you can see the files exported:
hdlfscli -k -cert client.crt -key client.key -cacert ca.crt -s hdlfs://<HDLFS endpoint> lsr /hdb_export_test
FILE mark mark 93 666 index/DBADMIN/EX/EXPORT_TEST/create.sql FILE mark mark 200 666 index/DBADMIN/EX/EXPORT_TEST/data.parquet FILE mark mark 605 666 index/DBADMIN/EX/EXPORT_TEST/export_system.info FILE mark mark 2176 666 index/DBADMIN/EX/EXPORT_TEST/table.xml
HANA DB Import Test
You can either import the data, tested here, or you can have the IMPORT statement create the object and load data, not shown here. For the data import, simply point the IMPORT statement to the parquet file generated during the export.
truncate table EXPORT_TEST; import from parquet file 'hdlfs://<HDLFS endpoint>/hdb_export_test/index/DBADMIN/EX/EXPORT_TEST/data.parquet' into EXPORT_TEST WITH CREDENTIAL 'myCredential'; drop table EXPORT_TEST;
SAP HANA Cloud, data lake relational engine (HDLRE) Import Test
Alternatively, we can also load this same parquet data into the HANA Cloud data lake relational engine. With HDLRE, you can only load files from the directly attached HDLFS container. The key to the above HANA DB statements is that the HDLFS enpoint that is used to export from HANA be the same as the one created when the HDLRE engine and instance is created.
create table EXPORT_TEST ( COL1 varchar(10) ); load table EXPORT_TEST ( COL1 ) using file 'hdlfs:///hdb_export_test/index/DBADMIN/EX/EXPORT_TEST/data.parquet' format parquet quotes off escapes off ; select * from EXPORT_TEST; drop table if exists EXPORT_TEST;
This blog does not cover the full use of HDLFS with HANA DB. In fact, a few areas have not been covered here, but are worth mentioning.
First, SAP Data Warehouse Cloud (DWC) can leverage the SAP HANA Cloud, data lake files repository as a feed to integrate and load data into DWC. We often don’t think of data coming from files or from an object store, but this is certainly a pattern that can be explored.
Second, SAP Data Intelligence has adapters to both consume (read) and produce (write) files in HDLFS. SAP Data Intelligence requires certificates and keys for access to HDLFS. Fortunately, most of that has been completed in this blog. SAP Data Intelligence requires a PFX/P12 version of the keys generated with OpenSSL. To generate the necessary file, you can run this command:
openssl pkcs12 -export -inkey client.key -in client.crt -out keystore.p12
The resulting file, keystore.p12, is used within Data Intelligence as the trusted certificate into HDLFS. One other important note here is that when running the OpenSSL command, it will ask if you want to password protect the keystore.p12 file. SAP Data Intelligence requires a password on this file, so please make sure to password protect it.