Skip to Content
Technical Articles
Author's profile photo Frank Schuler

Access SAP Data Intelligence Machine Learning with the Python SDK

When accessing a file from the SAP Data Intelligence Jupyter Lab, code leveraging the SAP Data Intelligence Machine Learning Python SDK is generated:

To use this SDK in a custom SAP Data Intelligence Operator, I add the sapdi Tag:

Which draws the com.sap.dsp.linuxx86_64/dsp-core-operators-docker Docker File:

That allows me to use the same code to access my file as from the Jupyter Lab and I output its head respectively:

Putting my Custom Operator in a simple Data Pipeline:

I get the expected result in my Wiretap:

Of course, this is only a simple example and the Python SDK provides a lot more functionality around:

  • Create and access ML scenarios and their versions, as well as to retrieve and update metadata
  • Create configurations, and start executions and deployments
  • Create and update pipelines and bind them to ML scenarios
  • Access artifacts from training containers
  • Report metrics through the ML Tracking SDK

Assigned Tags

      6 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Gianluca De Lorenzo
      Gianluca De Lorenzo

      Hello Frank,

      very good job, together with all your contributions to the SAP Data Intelligence community, thank you for enriching it!

      Quick question: I am trying to use the sapdi package from jupyter in DI-cloud, but I get a SSL error for a certificate that is not trusted (full error at the bottom).

      I uploaded the file from my laptop, therefore I assume I have no issues in reaching the data lake.

      Any suggestion?

      OSError: Failed to create handler for /shared/ml/data/MyDecodedData/E401/test.csv in data lake. Failed to check the status of /shared/ml/data/MyDecodedData/E401/test.csv in data lake. Reason: HTTPSConnectionPool(host='storagegateway', port=14000): Max retries exceeded with url: /webhdfs/v1/shared/ml/data/MyDecodedData/E401/test.csv?user.name=sdlclient&op=GETFILESTATUS (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines', 'RSA_padding_check_PKCS1_type_1', 'invalid padding'), ('rsa routines', 'rsa_ossl_public_decrypt', 'padding check failed'), ('asn1 encoding routines', 'ASN1_item_verify', 'EVP lib'), ('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
      Author's profile photo Frank Schuler
      Frank Schuler
      Blog Post Author

      Thank you, Gianluca.

      Can you see your file in the ML Data Manager and Metadata Explorer?

      Best regards

      Author's profile photo Gianluca De Lorenzo
      Gianluca De Lorenzo

      Yes, I can.  I uploaded the file using the Metadata Explorer and registered it with the ML Data Manager.

      I can also use the Metadata Explorer to preview and profile for instance.

      No problem from the modeler either, e.g. using the com.sap.file.read operator.

      BTW: I am using DI - Cloud

      Data Intelligence Version:
      2010.29.22
      Author's profile photo Frank Schuler
      Frank Schuler
      Blog Post Author

      Okay. That is a good start. What Object Storage Type do you use for your Semantic Data Lake (Connection DI_DATA_LAKE)?

      Author's profile photo Gianluca De Lorenzo
      Gianluca De Lorenzo

      S3

      Author's profile photo Thomas Wecker
      Thomas Wecker

      Hello Frank,

      thank you for the overview.

      Where can I find the detailed documentation of the Python SDK for SAP Data Intelligence (not the 48 pager high level overview)?

       

      Kind regards, Thomas