Dynamically Reading Blobs from Azure Blob Storage ...

Nic_Frigius · ‎10-31-2023

Objective:

This blog aims to illustrate how to dynamically read blobs (e.g., files) from Azure Blob Storage into the CPI.

Introduction:

At present, the AzureStorage adapter in SAP CPI doesn’t support the dynamic reading of blobs. While you can write blobs using a sender adapter, it mandates specifying an absolute path, which lacks dynamic. Although it’s possible to identify individual blobs in the sender, specifying entire directories is not viable. Additionally, using the “/*” wildcard is not possible with this adapter.

The receiver adapter’s capabilities are restrictive, notably lacking the “Get Blob” function, crucial for fetching a blob into the CPI. As a result, self-retrieval of blobs via the receiver is not possible.

Preparation:

To dynamically access blobs, you’ll need to utilize a REST API provided by Microsoft (https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob?tabs=azure-ad). Since you’re bypassing the Azure adapter and using an HTTP adapter, OAuth 2.0 authentication becomes essential, requiring configuration within Azure. Here’s how you can set up OAuth 2.0 authentication in Azure:

Register an App:
Navigate to the Azure portal > “Azure Active Directory” > “App registrations” > “New application registration.” Only the application name is mandatory; other fields can remain empty.

Create client_id and client_secret:
Post registration, locate the “Application (client) ID” in the overview as your client_id. To generate a client_secret, go to “Certificates & secrets” > “New client Secret.” Provide a name and select “Create.” It’s crucial to copy and safely store the client secret promptly post-generation, as it’s not visible for long.

Set permissions in your storage
Navigate to storage account > containers > your selected container > “Access Control (IAM).” Use “add role assignment” to integrate your new app with the “Contributor” role.

Step 1: Register an App

Log in to the Azure portal, and then select “Azure Active Directory” > “App registrations” > “New application registration.”

You only need to enter the application name; all other fields can remain empty.

Step 2: Create client_id and client_secret

Now you need to create security materials. You can find the client_id directly in the overview of the newly registered app as “Application(client) ID.”

You need to create the client_secret yourself. To do this, go to “Certificates & secrets” > “New client Secret.” Enter any name and click “Create.”

The client secret can be found under “Value”. Important: the client secret is only visible for a short time, so it’s best to copy it immediately after creation and save it for later.

Step 3: Set permissions in your storage

Go to the storage account > containers > your container > “Access Control (IAM).” Click on “add role assignment” and add your newly created app to the “Contributor” role:

As we still use the azure adapter in the iflow to get a list of the blobs, we need to create a SAS Token for authentification. Subsequently, for an SAS token, right-click on the relevant container and choose ‘Generate SAS’.

Implementation in CPI:

To achieve our goal, we need to follow 2 steps:

Utilize the created security materials (Oauth2.0 and SAS Token) from Azure

Build an Iflow to archieve our goal

Step 1: Impliment the created security materials

Create the Client Credentials:
Go to monitoring > Security Material and create new “Client Credentials”. These include 10 fields:

Name: You can enter a name of your choice here.

Description: You can add a description.

Token Service URL: Enter the link generated by Azure.

Client ID: Enter the Client ID.

Client Secret: Enter the created Client Secret.

Client Authentication: Select “Send as Body Parameter.”

Scope: In our case, the scope is always “https://storage.azure.com/.default.”

Content Type: Choose “application/x-www-form-urlencoded.”

Resource: Leave it empty.

Audience: Leave it empty.

Create the SAS Token:
Go to monitoring > Security Material and create Secure Parameter. This includes 4 fields:

Name: You can enter a name of your choice here.

Description: You can add a description.

Secure Parameter: Here you paste in the SAS Token you got

Repeat Secure Parameter: Just repeat the SAS Token

Step 2: Build the Iflow

I’ve devised an iFlow to facilitate dynamic access:

(1): Commence with a daily timer.

(2): Retrieve a list of all storage blobs using a request-reply via the Azure adapter. Authenticate with the SAS token and provide a specific name.

We receive an XML document in return, where each blob, along with its path, is stored in XML, and the names are under the “Name” tag.

(3): Now we have the names of all blobs in the container. However, we want them individually. So, we use a splitter to split each blob name individually using the Xpath “//Blob.”

(4): We store the name of the blob in the properties using a content modifier. We use the Xpath, this time for the “Name” tag.

(5): Now have the names of each blob in our storage. Since we only want blobs from a specific directory, we need a script to check if the blob is in the correct directory:

1 import com.sap.gateway.ip.core.customdev.util.Message;

2 import java.util.HashMap;

3

4 def Message processData(Message message) {

5    

6    def headers = message.getHeaders()

7    def properties = message.getProperties()

8    def filename = properties.get("filename")

9    def notProcessed = 'false'

10   

11   def pattern = ~'toBeProcessed.*'

12    

13   if(filename ==~ pattern){

14       notProcessed = 'true'

15   }else{

16        notProcessed = 'false'

17   }

18    

19   message.setHeader("notProcessed", notProcessed);

20   return message;

21 }

In line 11, we define a pattern using a regex that the blob name must match. In our case, we define the directory as “toBeProcessed”. So, the blob name must start with that, followed by any character(s). In line 13, we check if the blob name matches the pattern, and if it does, we set the “notProcessed” variable to true. This is important for later. In line 19, we set the “notProcessed” variable as a header.

(6): Now we sort the blobs using a router. This router checks if the header “notProcessed” contains true. If that’s the case, we go to Route 2.

If the header “notProcessed” is false, the message is discarded.

(7): After the router, we create a content modifier to set the following headers:

x-ms-version: 2019-12-12
resource: https://storage.azure.com
Host: {blob storage Account name}.blob.core.windows.net

These headers are essential for correctly generating the bearer token in the next step and accessing Blob Storage.

(8): Now we’ll use a request-reply to obtain access via HTTP. In the Address field, enter the URL of Microsoft’s REST API:

In the myaccount field, enter your Storage Account Name.

In the mycontainer field, enter the name of the container you’re interested in.

In the myblob field, enter the expression ${property.filename} to dynamically get all the blobs located in the desired directory.

The remaining fields are filled as follows:

For the Credential Name, enter the name you provided when creating the security material.

(9): Since we want to write the blobs to a different directory, I’ve written a script to specify the target path for the blob:

1 import com.sap.gateway.ip.core.customdev.util.Message;

2 import java.util.HashMap;

3

4 def Message processData(Message message) {

5

6   

7   def headers = message.getHeaders()

8   def properties = message.getProperties()

9   

10  def path = "/processed/"

11  def fullEntityName = properties.get("filename")

12    

13  def arrayEntityName = fullEntityName.split('/')

14  def entityName = arrayEntityName[1]

15

16  def blobPath = path + entityName 

17

18  message.setHeader("blobPath", blobPath);

19  message.setHeader("Content-Type", "application/xml");

20

21  return message;

22 }

In our example, the blobs should be written to the directory /processed/.

Since we have stored the entire path of the blob (including the name) in the property filename, we need to extract and remove the path. In line 13 i’ve split the path at “/”, which separates filename into individual parts stored in an array. Since the blobs are all in a subdirectory (/toBeProcessed/test.xml), we now take entry 1 from the array because this is the file’s name (line 14). We define the target path including the name in “blobpath” and write it in the header (line 16+18). Additionally, we specify the Content Type of the file in the header(line 19).

(10): Now we just need to send the blob to our target directory in the storage. For this, we again use the Azure adapter. Under Connection, provide your SAS Token. The Processing is filled as follows:

Since we’ve already defined the target directory in the script, we can use the Camel Expression ${header.blobpath} here, to provide the blobpath.

Conclusion

The illustrated iFlow facilitates dynamic reading, modification, and redirection of all blobs from a Blob Storage. Post the secondary request-reply, various operations like conversions or edits can be performed on the blob content.

Thank you for reading this blog post. I welcome your feedback, thoughts, or questions.

Best Regards,
Nic Frigius