Technical Articles
Connecting Google Cloud Storage with SAP DataIntelligence & Python
The blog assumes that you have access to a Google Cloud account and have created a project and a DataIntelligence 3.1 or higher instance.
- Login to your Google Cloud Storage Account
- Navigate to IAM – Service Accounts
3. For your project this shows the existing service accounts. If you do not have a service account. Create one.
4. Choose a service account with existing keys or create a key using
https://cloud.google.com/iam/docs/creating-managing-service-account-keys
5. Download the key in Json format on your local machine. This can be done by clicking on ‘Add key’ Make a note of the project name
Now invoke Connection Management from SAP DI
Create a new connection of type Google Cloud Storage
The project id should exactly match the project id from GCP
Provide the key file downloaded in json format
For more details refer help
Test the Connection!
You should also be able to browse this connection using the DataIntelligence Metadata Explorer and view the bucket contents.
Connecting GCS using Python
Using ML Scenario manager and notebooks you can connect to GCS using Python Notebook
Upload the json key file to DataIntelligence. Here we uploaded it to /vrep/mlup
Type in the following code
from gcloud import storage
from oauth2client.service_account import ServiceAccountCredentials
import os
import json
from google.oauth2 import service_account
project_id = 'sap-digitalman*****'
bucket_name = 'ifn***'
with open('/vrep/mlup/sap-digitalmanu*****.json') as json_file:
credentials_dict = json.load(json_file)
#credentials_dict = json.load('/vrep/mlup/sap-digitalmanu*****.json')
credentials = ServiceAccountCredentials.from_json_keyfile_dict(credentials_dict)
client = storage.Client(credentials=credentials, project='project_id')
bucket = client.get_bucket(bucket_name)
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
you may have to install
pip install –upgrade google-cloud-storage
pip install –upgrade gcloud
in case of missing modules
The output should list the bucket contents.
Great blog Asim!
Tested it and it works.
Thank you very much Asim for the great job
Babacar
One update from my side:
I tried yesterday again to connect to GCS using the above Python code in a Jupiter Notebook, but it failed with the following error message:
Deprecated GCS module
As you can see in the error message the module oauth2client has been deprecated and can hence not be recognized as such.
I instead imported the google.oauth2 library from which I used the module service_account and adapted slightly the code as below:
Hope this helps in case you run in a similar issue.
regards,
Babacar
Thank you Asim for authoring this blog. The details are clear & precise ... very helpful. Look forward to more blogs & your thoughts on this topic.
Homiar