Technical Articles
Access SAP Data Intelligence Machine Learning with the Python SDK
When accessing a file from the SAP Data Intelligence Jupyter Lab, code leveraging the SAP Data Intelligence Machine Learning Python SDK is generated:
To use this SDK in a custom SAP Data Intelligence Operator, I add the sapdi Tag:
Which draws the com.sap.dsp.linuxx86_64/dsp-core-operators-docker Docker File:
That allows me to use the same code to access my file as from the Jupyter Lab and I output its head respectively:
Putting my Custom Operator in a simple Data Pipeline:
I get the expected result in my Wiretap:
Of course, this is only a simple example and the Python SDK provides a lot more functionality around:
- Create and access ML scenarios and their versions, as well as to retrieve and update metadata
- Create configurations, and start executions and deployments
- Create and update pipelines and bind them to ML scenarios
- Access artifacts from training containers
- Report metrics through the ML Tracking SDK
Hello Frank,
very good job, together with all your contributions to the SAP Data Intelligence community, thank you for enriching it!
Quick question: I am trying to use the sapdi package from jupyter in DI-cloud, but I get a SSL error for a certificate that is not trusted (full error at the bottom).
I uploaded the file from my laptop, therefore I assume I have no issues in reaching the data lake.
Any suggestion?
OSError: Failed to create handler for /shared/ml/data/MyDecodedData/E401/test.csv in data lake. Failed to check the status of /shared/ml/data/MyDecodedData/E401/test.csv in data lake. Reason: HTTPSConnectionPool(host='storagegateway', port=14000): Max retries exceeded with url: /webhdfs/v1/shared/ml/data/MyDecodedData/E401/test.csv?user.name=sdlclient&op=GETFILESTATUS (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines', 'RSA_padding_check_PKCS1_type_1', 'invalid padding'), ('rsa routines', 'rsa_ossl_public_decrypt', 'padding check failed'), ('asn1 encoding routines', 'ASN1_item_verify', 'EVP lib'), ('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
Thank you, Gianluca.
Can you see your file in the ML Data Manager and Metadata Explorer?
Best regards
Yes, I can. I uploaded the file using the Metadata Explorer and registered it with the ML Data Manager.
I can also use the Metadata Explorer to preview and profile for instance.
No problem from the modeler either, e.g. using the com.sap.file.read operator.
BTW: I am using DI - Cloud
Okay. That is a good start. What Object Storage Type do you use for your Semantic Data Lake (Connection DI_DATA_LAKE)?
S3
Hello Frank,
thank you for the overview.
Where can I find the detailed documentation of the Python SDK for SAP Data Intelligence (not the 48 pager high level overview)?
Kind regards, Thomas