Build an Azure Data Factory Pipeline with the ODBC...

Frank-Martin · ‎02-21-2024

In this blog post, we will show how you can build up an Azure Data Factory (ADF) pipeline to load data from an SAP system using the ABAP SQL Service and the ODBC driver for ABAP. We will describe how we used a simplified Azure Data Factory pipeline setup using the ABAP SQL service.

You could take this blog post as a blueprint for your own full-blown Azure Data Factory data pipeline setup.

From my professional background, I am an expert for relational databases, SQL, and ODBC. Many thanks go to my Microsoft colleague Marius Cirstian, who helped me with the Azure Cloud setup and testing as well as with this blog post.

Background: The ODBC Driver for ABAP and the ABAP SQL Service

As a developer in the ABAP environment, you can access CDS view entities in an ABAP system using SQL and the open database connectivity (ODBC), a standard API for accessing databases. As a result, you can use SQL statements in external analytical tools or in self written programs to access data in database tables that reside in the ABAP environment. Data from different entities can be joined in an ad-hoc fashion and data can be aggregated for analytical queries.

If you need a more detailed introduction to the ABAP SQL service and the ODBC driver for ABAP, see my blog post SQL Queries on CDS objects Exposed as SQL Service | SAP Blogs . Below, at the end of this blog post here, you will also find more related blogs and links to the official SAP documentation.

Now let’s turn to the Azure Data Factory pipeline and how you can use the ODBC driver for ABAP to load data from an SAP system. We will describe what we did in our test scenario and guide you through it step by step.

Used Components

To build our ADF pipeline we used the following components:

An SAP S/4 HANA system installed on a virtual machine in the Azure cloud.
A second virtual machine, serving as an Azure Data Factory – Self-hosted Integration Runtime. For the sake of simplicity, we created our second virtual machine without public IP address, only with a private IP address in the same virtual network as the SAP S/4 HANA system. The private IP address of the second VM was 10.7.0.6. From this VM, we could access the VM with the SAP system with private IP 10.7.0.4. This second VM also hosted our ODBC driver for ABAP and the corresponding ODBC DSN to our ABAP system on the first VM.
As a source data store, we used an ODBC connection to our SAP S/4 HANA system.
As a target data store, we created an Azure Storage Account and a container in this storage account and exported data from the SAP system into .csv file format.
Finally, the ADF scenario consisted of an Azure Data Factory resource, an ADF self-hosted integration runtime, two Linked Services for source and target, and the respective datasets for each linked service.

Defining an Azure Data Factory

As a first step, go to the Azure Portal, search for “Data Factory” and create a new Data Factory:

After entering your instance name and region, click on “Review and Create” . On the Networking tab, choose “Public endpoint” and create the data factory with default settings.

Once the factory has been created, click on “Open Resource” to open it.

Defining the Self Hosted Integration Runtime

Launch the Azure Data Factory Studio and click on “Manage” on the left side. Here you can now create a new integration runtime for your Azure Data Factory.

After the Integration runtime has been defined, you will see the following screen. If you manually set up the integration runtime software as we did on our second VM, you need to remember the authentication keys because at least one will be needed later.

Installing the Self-Hosted Integration Runtime

For our scenario, choose option 2 here and download the integration runtime software. The software needs to be installed on virtual machine 2. After the standard installation, the following window will be opened, and you will need to configure the integration runtime:

Now you must copy one of the authentication keys that you received above into the text box and click on “Register”:

After the registration has finished successfully, the new self-defined integration runtime will also show up in the Azure Data Factory Studio:

Installing the ODBC Driver and Defining an ODBC DSN for the Source Data Store

To enable our ADF data pipeline to retrieve data via ODBC from the ABAP system and to write it to a .csv file, you now need to install and configure the ODBC driver for ABAP on virtual machine 2 as described in the blog post Consuming CDS View Entities Using ODBC-Based Client Tools. In our test scenario, this was a fresh empty virtual machine, so it’s likely that no C runtime libraries were installed on this virtual machine.

Make sure to get the C runtime libraries. Without them, you won’t be able to define an ODBC DSN after installing the ODBC driver (see C-runtimes needed to run SAP executables | SAP Blogs).

After these preparation steps for our case, the DSN for the ABAP system on virtual machine 1 may look like this:

Please note that the second virtual machine connects to the ABAP system via the private IP 10.7.0.4 .
I used the following powershell script to verify that the DSN definition was correct and the connection to the ABAP system worked with our ABAP test user. The script also shows the exposed view entities seen by this user. One of these entities was our test view ZFlight.ZSFLIGHT.

# run in cmd admin window as:
# "powershell.exe -File odbc_show_views.ps1"

# input connection parameters
Write-Output "";
$dsn = Read-Host "ODBC DSN?";
$uid = Read-Host "ABAP User?";
$pwdsec = Read-Host -AsSecureString "ABAP Password";
$pwdbstr = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($pwdsec)
$pwdvalue = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($pwdbstr)
Write-Output "";

# Connect to the database
$SqlConnection = New-Object System.Data.ODBC.ODBCConnection
$SqlConnection.connectionstring = 
   "DSN=" + $dsn +
   ";UID=" + $uid +
   ";PWD=" + $pwdvalue + ";" # +
#   "Trace=debug;TracePath=C:\temp;" ;
 
 Write-Output $SqlConnection.connectionstring 
   
Write-Output "Opening Connection to:" 
$constr_without_pwd = "DSN=" + $dsn + ";UID=" + $uid 
Write-Output $constr_without_pwd 
$SqlConnection.Open()
Write-Output "Connection State:"
$SqlConnection.State

$ScalarSqlQuery = "select count(*) from SYS.VIEWS"
$Command = New-Object `
    System.Data.ODBC.ODBCCommand($ScalarSQLQuery, $SQLConnection)
$Reply = $Command.executescalar()
Write-Output "";
Write-Output "Number of exposed entities:";
$Reply 
Write-Output "";

$Sqlcmd = $SqlConnection.CreateCommand()
$Sqlcmd.CommandText = "SELECT SCHEMA_NAME, VIEW_NAME FROM SYS.VIEWS WHERE SCHEMA_NAME = 'SYS' "
$SysTabResult = $Sqlcmd.ExecuteReader()
Write-Output "Exposed System Views:";
while ($SysTabResult.Read())
{
  $SysTabResult.GetValue(0) + "." + $SysTabResult.GetValue(1)
}
$SysTabResult.Close() 
Write-Output "";

$SqlConnection.Close()

Creating an Azure Storage Account for the Target Data Store

You need to create an Azure storage account to define a data factory pipeline with a simple target data store that writes to a .csv file. A storage account can be created in the Azure Portal. The following settings worked fine for our test scenario:

After creating the storage account resource, open “Access Keys” in the “Security + networking” section in the left panel. Copy the first key to the clipboard. This key will be needed later to create the target data store of our ADF data pipeline.

Now click on the containers on the left side and add a container for your storage account. In our tests, we called this container “testdata”.

Configuring the Azure Data Factory ETL Pipeline (Source Linked Service)

Now we have all the pieces needed to define the ADF data pipeline. Go to the Azure Data Factory Studio and click on “Linked Services” and “New”. For more background information see Introduction to Azure Data Factory - Azure Data Factory | Microsoft Learn.

You can find some SAP standard data store types under “Data Store”, but there’s no standard tile for the ABAP SQL Service and the ODBC driver for ABAP yet. Therefore, use the general “ODBC” tile. Under “Data Store”, search for “ODBC” and continue.

We assigned the name ABAP to our ODBC-linked service. Since we already defined an ODBC DSN for our ABAP system, we could use the simple ODBC connection string “DSN=ABAP”. We also entered the name of our self-hosted integration runtime. After providing the name of our ABAP test user and password, you can test the connection and then create the linked service.

Configuring the Azure Data Factory ETL Pipeline (Target Linked Service)

Go back to the Azure Data Factory and create the target “Linked Service (Data Store)” for the storage account. Under “Linked Services”, click on “New” and search for “Azure Blob Storage”.

In “Storage account name”, enter the name of the storage account that you created before. The account key must be one of the keys you received for this storage account. You can now test the connection and create the target-linked service if everything works fine. You should now see two linked services in the list:

Configuring the Azure Data Factory ETL Pipeline (Creating Data Sets)

Let’s now create the datasets for each of the created linked services. The datasets allow you to define which data is retrieved or written in the respective linked services.

In the Azure Data Factory Studio, click on “Author” on the left side and then choose “Datasets”:

Under “New Dataset” again, search for “ODBC”, enter a name for your dataset, and choose the ODBC-linked service that you created earlier. The used integration runtime will be automatically added. As a table name in our test, we entered the name of our exposed CDS test view entity ZFlight.ZSFLIGHT .

To create a target dataset, click again on “New Dataset”, search for “Azure Blob Storage”, and choose “Delimited Text”. In “Linked service”, enter “targetLinkedService”. As first component of the file path, choose the container created for the blob storage account.

Creating the Data Pipeline

Go to “Author” –> “Pipelines” –> “New Pipeline”. Here we chose the name “ABAPtoBlobStorage” as our pipeline name. Drag and drop the “Copy Data” activity under section “Move and transform” to the right panel:

On the “Source” tab of our “Copy Data” activity, we used our source data set. We used a query here to extract data from our test table ZFlight.ZSFLIGHT because column PRICE has ODBC type SQL_DECFLOAT and the general ODBC linked service can’t handle this driver-specific decimal floating point data type. Instead, we explicitly cast the PRICE column in the query to a fixed-point decimal type. You can use the data preview to check whether the data extraction works properly for this query.

On the “Sink” (target) tab of our “Copy Data” activity, use the target data set:

You can now click on “Debug” to perform a first test of the data pipeline:

Deploying the Created Objects

In the top bar of the window, there is a button named “Publish All”. By clicking on it, all defined artifacts that were created for the Data Factory pipeline (linked services, datasets, pipelines) will be deployed.

If this was successful, go ahead and check in the “Storage Account” container. Did you find the .txt file created and stored by the pipeline in the storage account, with the data read from the SAP ABAP system? If yes, then you can consider the test as completed.

In our case, the file stored in the blob storage container should look like this:

Conclusion

We used a very simple scenario to demonstrate that the ODBC driver for ABAP can be used in an Azure Data Factory pipeline to extract data from an ABAP SQL service and use it elsewhere. In the Azure Data Factory pipeline, we used the standard ODBC connector to define a linked service. Currently, there is no specialized SAP connector for the ODBC driver for ABAP available. To simplify the ODBC connection string, we predefined an ODBC DSN on the machine holding the self-hosted integration runtime. ODBC types specific to the ODBC driver for ABAP like SQL_DECFLOAT must be avoided. Instead, we used CASTs in the SQL statement because the standard ODBC connector can’t handle such data types.

We hope that our description of this simplified scenario can serve as a blueprint for more complex use cases in Azure Data Factories.

As always, we are very interested in your feedback.