Skip to Content
Technical Articles
Author's profile photo Feng Liang

Data Federation between SAP HANA Cloud and Amazon S3 to Blend Business Data with External Data

In this article, we will talk about how to perform data federation between SAP HANA Cloud and Amazon S3 bucket so that we can incorporate external data into our business data while keeping both data sitting in their own places. We will be focusing a bit more on the integration part where we need to install and run SAP Data Provisioning Agent on Linux environment. The architecture for the result looks like the following: 

Please note that our demo case is using EC2 instance as host for our DPAgent and storage gateway service as a virtual storage mounted to EC2 instance. Your own case might be a bit different, for instance, you have your own on-premises system to host DPAgent. (DPAgent, also named Data Provisioning Agent which hosts all SAP HANA smart data integration Adapters and acts as the communication interface between HANA, on which the SAP Data Warehouse Cloud is built, and the Adapter.) 

Part 1: Creating S3 bucket and Storage Gateway 

Since there are already many documents regarding this part, we are just going to mention what we need here. First, we need a private S3 bucket with our csv data uploaded; Second, we create a Storage Gateway in NFS mode and then a File Share connection with our s3 bucket. 

Please note the commands showing how to connect to our file share as we will use them in later section. 

Part 2: Installation and Configuration of SAP Data Provisioning Agent  

Next, we create an ubuntu ec2 instance with inbound rules set to all (this is just for testing purposes and not allowed for production use, you require more secure settings according to your needs). 

After creating the instance, we go to SSH client tab and follow instructions to log into your ubuntu instance.

On your local machine (Mac OS for our case), go to SAP Software Download Center and download the latest patch of DPAgent for linux. 

Now, we copy this installation file into our ec2 instance using the following command (all later commands are used under mac terminal): 

scp -i s3_linux_dpa_key_pair.pem <path/downloaded DPAgent>  ec2-user@<your ec2 host>.compute-1.amazonaws.com:/home/ec2-user 

Login into our ubuntu ec2 instance, unzip the file we transported and get into the folder where hdbinst.sh file is sitting. After creating 3 folders (/usr/sap/dataprovagent, /usr/sap/dataprovagent/s3,  /usr/sap/dataprovagent/agentconfi), we use the following command to install our DP Agent: 

./hdbinst --silent --batch  --path="/usr/sap/dataprovagent" --agent_listener_port=5050 --agent_admin_port=5051 

Don’t forget to mount our S3 file share to /usr/sap/dataprovagent/s3. 

Now at the command line, navigate to <DPAgent_root>\bin, for our case which is /usr/sap/dataprovagent/bin. Then we run 

./agentcli.sh --configAgent 

 to open the DPAgent Configuration Tool: 

We need to choose option 2 to start our agent first. Then choose option 6 to connect to our HANA cloud instance. 

Choose option 1 to follow instructions and input information to connect to our hand cloud instance. If you do not have a HANA user for agent messaging, it is ok as it will create one for you. 

Next, we register our agent to the HANA instance. The agent host name is the public IP of your ec2 instance. 

Before we start to register our FileAdapter, we need to configure it.  We use 

./agentcli.sh –configAdapters

 to do so but if you have problems you can edit  file/usr/sap/dataprovagent/configuration/com.sap.hana.dp.adapterframework/FileAdapter/ FileAdapter.ini directly:  

Nextwe will set up an access token for our FileAdapter using 

./agentcli.sh –setSecureProperty

Choose 11 and use the token ‘Accesstokentest12345’ for example. 

Now use the configuration tool to register our FileAdapter. 

 

Don’t forget to use the following command to generate a csv config file and move it to /usr/sap/dataprovagent/agentconfi.  

./createfileformat.sh -file <path of folder which contains your csv data file> -cfgdir <path of folder which contains your csv data file> -format "CSV" -colDelimiter ,

 

If everything goes well, now you are able to see your adapter and agent under your SAP HANA Cloud instance. 

Part 3: Part 3: Creating Virtual Table in SAP HANA Cloud  

In the HANA Database Explorer, under catalog -> remote sources, choose Add Remote Sources. 

Set the token using the one we created earlier. 

Choose the csv file name and click Create Virtua Objects. 

If it is successfully created, you will be able to find the virtual table under table catalog, and be able to run select queries over the table. 

In summary, we have explained how to federate data between SAP HANA Cloud and CSV data sitting inside Amazon S3 bucket without the need of data replication. 

Thanks for reading. If you have any questions, please reach out to ci_sce@sap.com. 

Assigned tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.