Skip to Content
Technical Articles
Author's profile photo Jason Hinsperger

Setting Up Initial Access to HANA Cloud data lake Files

The HANA Cloud, data lake supports storage of any type of data in its native format. Managed file storage provides a place to securely store any type of file without requiring you to setup and configure storage on an external hyperscaler account. This is very useful if you need a place to put files for ingestion into HANA Cloud, data lake IQ for high speed SQL analysis, or if you need to extract data for any reason. Hopefully, in the near future, HANA data lake Files will also be easily accessible from HANA Cloud, HANA databases as well.
Setting up and configuring access to SAP HANA Cloud, data lake Files for the first time can be a difficult process, especially if you are coming from a database background and are not familiar with object storage or REST APIs.

Here is a process I have used to test HANA Data Lake Files. Because HANA Data Lake Files manages user security and access via certificates, you need to generate signed certificates to setup user access. If you don’t have access to a signing authority, you can create a CA and signed client certificate and update the HDL Files configuration using the process below, which leverages OpenSSL. I have used it many times so it should work for you.

First you need to create and upload a CA bundle. You can generate the CA using the OpenSSL command:
openssl genrsa -out ca.key 2048

Next, you create the CA’s public certificate (valid for 200 days in this case). Provide at least a common name and fill other fields as desired.
openssl req -x509 -new -key ca.key -days 200 -out ca.crt

Now you need to create a signing request for the client certificate. Provide at least a common name and fill other fields as desired.
openssl req -new -nodes -newkey rsa:2048 -out client.csr -keyout client.key

Finally, create the client certificate (valid for 100 days in this case)
openssl x509 -days 100 -req -in client.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out client.crt

*Note* – Make sure the fields are not all exactly the same between the CA and client certs, otherwise it is assumed to be a self-signed cert and the cert validation below will fail.

To verify the certificate was signed by a given CA (so that when you upload the CA certificate to HANA data lake you know it can be used to validate your client certificate):
openssl verify -CAfile ca.crt client.crt

 

Then open your instance in HANA Cloud Central and choose “Manage File Container” to setup your HANA Data Lake Files user.

 

Edit the configuration and choose “Add” in the “Trusts” section.  Copy or upload the ca.crt you generated earlier and click “Apply”.  However, don’t close the “Manage File Container” screen just yet.

Now we can configure our user to enable them to access the managed file storage.

Scroll down to the “Authorizations” section and choose “Add”.  A new entry will appear

Choose a role for your user from the drop down (by default there are admin and user roles).

Here is where things get a little tricky.  You need to add the pattern string from your client certificate so that when you make a request, the storage gateway (the entry point into HANA data lake files) can determine which user to validate you against.

You have 2 options for generating the pattern string.  You can use the following OpenSSL command to generate the pattern string (omit the “subject= “ prefix that will appear in the output)

openssl x509 -in client.crt -in client.crt -nameopt RFC2253 -subject -noout

Alternatively, you can use the “generate pattern” option on the screen, which will open a dialog box that allows you to upload/paste your client certificate and will automatically generate the pattern for you.  Note that we do not save the certificate, only the pattern string:

Click “Apply” to add the pattern string to your authorizations entry.

Note that the pattern string also allows wild cards, so you can authorized a class of certificates with a certain role.  If the certificate pattern matches multiple authorizations, the one that is used is governed by the “Rank” value set for the specific authorization entry.

 

You should now be able to access and use HANA Data Lake Files via the REST api.

Here is a sample curl command which works for me and should validate that you have a successful connection (the intance id and files REST API endpoint can be copied from the instance details in the HANA Cloud Central).  Use the client certificate and key that you generated above and used to create your authorization.

Note that curl can be a little tricky – I was on Windows and I could not get the Windows 10 version of curl to work for me. I ended up downloading a new curl version (7.75.0), which did work, however, I had to use the ‘–insecure’ option to skip validation of the HC server certificate because I wasn’t sure how to access the certificate store on Windows from curl.

curl --insecure -H "x-sap-filecontainer: <instance_id>" --cert ./client.crt --key ./client.key "https://<Files REST API endpoint>/webhdfs/v1/?op=LISTSTATUS" -X GET

The above command should return (for an empty HANA Data Lake):

{"FileStatuses":{"FileStatus":[]}}

 

Now you should be all set to use HANA Data Lake Files to store any type of file in HANA Cloud.  For the full set of supported REST APIs and arguments for managing files, see the documentation

Thanks for reading!

Assigned tags

      4 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Nishant Gopinath
      Nishant Gopinath

      Hi Jason,

      Would you have an example to demonstrate how I can PUT the local files (e.g., compressed zip files) or external storage (Azure Data Lake or S3) to HDL (uncompress and load or retain as a compressed zip file)? Also, how do I use a postman or swagger instead of the CURL commands with the certificates?

      This would be really helpful in trying out some scenarios in which I can move image or video or blob files from local/ external storage to HDL and back.

      Regards,

      Nishant

      Author's profile photo Jason Hinsperger
      Jason Hinsperger
      Blog Post Author

      Hi Nishant,

      The REST api is documented here. There is a tool shipped with the HANA data lake client called hdlfscli which is a wrapper for the REST api and allows you to connect to and interact with your instance.

      To use Postman to access HANA data lake Files, you need to add the client certificates for the HANA data lake Files endpoint to your list of certificates in your settings.

      You can also use standard http libraries from Java, Python, etc... to access HANA data lake Files.  An example of an upload (PUT) request for a file using a python script would look something like this:

      import http.client
      import json
      import ssl
      import sys
      import getopt
       
      # Defining certificate related stuff and host of endpoint
      certificate_file = '.\client.key'
      certificate_key= '.\client.key'
      host = 'c7f1dcc2-fade-444d-9287-79ca5dbf4fb6.files.hdl.canary-eu10.hanacloud.ondemand.com'
      
      putfile = 'foo.parquet'
      request_url='/webhdfs/v1/home/in/' + putdir + putfile + '?op=CREATE&data=true'
      
       
      # Defining parts of the HTTP request
      request_headers = {
          'x-sap-filecontainer': 'c7f1dcc2-fade-444d-9287-79ca5dbf4fb6',
          'Content-Type': 'application/octet-stream'
      }
        
      # Create a connection to submit HTTP requests
      connection = http.client.HTTPSConnection(host, port=443, key_file=".\client.key", cert_file=".\client.crt")
       
      # Use connection to submit a HTTP POST request
      connection.request(method="PUT", url=request_url, body=open(putfile, 'rb'), headers=request_headers)
      response = connection.getresponse()
      print(response.status, response.reason)
      data = response.read()
      print(data)
      

      Regards,

      --Jason

       

      Author's profile photo Nishant Gopinath
      Nishant Gopinath

      Thanks Jason. I am able to run with the swagger.json on postman and also use the sample python script to transfer the local files to HDL.

      Regards,

      Nishant

      Author's profile photo Jason Hinsperger
      Jason Hinsperger
      Blog Post Author

      That's great to hear Nishant.

      Regards,

      --Jason