Skip to Content
Technical Articles
Author's profile photo Sven Huberti

SAP Integration Suite – Scrape and visualize data as you please

This pandemic is getting annoying: no more social contacts, no more shops, no more restaurants, no more sport. Wait: actually our local government (Hessen) has allowed Fitness club to stay open under strict rules, one of them being that only one person per 40m2 is allowed in their premises. That actually means, that the fitness club is regularly “full”. Because I don’t like to wait, I wanted to see when exactly the most “free slots” are available. The information about free slots is available online, over a standard HTML page.

But after all, this is just another REST API, just in HTML format, so I decided to scrape that information in order to persist and analyse it. I could have chosen a Hana Database, Grafana, Azure PostgreSQL, etc., but I wanted to build something “quick&dirty” since the pandemic should be over soon (or we will soon be forbidden entrance to fitness clubs).

So I decided to use the Cloud Integration service to regularly scrape the number of free spaces from the HTML page and push it to a Google sheet using Open Connectors.

In this blog, I will describe this process in detail, assuming that you know the SAP Integration Suite a little bit already. Also, you should be able to implement this use case using the Integration Suite Trial.

Note that this is just a “quick and dirty” demo, but shows off how easy it is to use the Integration Suite to also support the needs of building new apps in an agile and almost “no-code” way.

Create a Google sheet

First of all, please create a spreadsheet using the account of your choice in Google sheets.

Two things need to be written down after you have created the Google spreadsheet: it’s ID and (optionally) the name of the worksheet (if you decide to change it). You can get this over the URL and through the Google UI.

Your Google spreadsheet gets saved automatically, so nothing more needs to be done here.

Create the Google Open Connector instance

In regards to the SAP BTP Integration Suite, let’s start with the easiest part: creating an Open Connector instance for Google Sheets.

To do so, open up your Integration Suite splash-screen and click on “Open Connectors” (if it’s not available yet, activate it in the “capabilities”).

You can obviously also use the standalone Open Connectors service if you licensed it before we released the integration suite.

In your Open Connectors, create a new authenticated instance of Google Sheet v4:

  1. Go to Connectors
  2. Search for “Google” Open Connectors
  3. Hover with your mouse over the Google Sheets V4 Open Connector and click on the “Authenticate” link

Name your Open Connector as you want and start the creation of the instance by logging in to Google with your preferred user.

At the end of the instance creation, navigate to the API docs of the Open Connector instance.

Here you can discover and test all the resources of the Open Connector instance.

I personally like to use that central documentation because you can test und understand the API calls before you implement them in Cloud Integration or any other service or IDE.

What we actually want to do here, is to insert rows into a spreadsheet, containing the current date and time, as well as the amount of “free spaces” (grabbed from the HTML page). So the API Call we will do, is a POST of a new row.

Locate the “/spreadsheets/{id}/worksheets/{worksheetName}/rows” resource.

  1. Open it by clicking on the “POST” method and click on the “try out” button
  2. Insert the ID of your spreadsheet in “ID” parameter.
  3. Insert the name of your worksheet (if you have not renamed it, it should be “Sheet1”) in the “worksheetName” parameter.
  4. Insert the following text in the “RowBody” parameter: this is expected by Google Sheets to insert a new row over the API.
{"values": ["22.04-12:50", 0]}

All this should like like that on your screen:

Now scroll down and click the “Execute” button.

As you can see, we have a “200” response and if you check in your spreadsheet, a row was added.

We will now move to the Cloud Integration part.

Before we do so, either copy the CURL request from the Open Connectors documentation, or leave the tab open for the coming steps.

Create the Google Sheet Open Connector credentials

As you may know, you must centrally define credentials in the Cloud Integration service. The same applies to Open Connector credentials.

To do so, navigate to the Monitor and Security Material section of your Cloud Integration tenant.

Now click on Create and User Credentials.

This is where you need to be a little careful.

  1. In the credentials creation window, select OpenConnectors as Type.
  2. Enter the name and the description as you want. In my case I have used “GoogleSheet_CREDS and “Credentials for Google sheet open connector.
  3. Now get ready to copy&paste: use the CURL command text from before to copy and paste the three values in the fields User, Organisation and Element.
  4. Deploy the credentials.

It may be a good idea to remember the credential name for later.

Create and configure the integration flow

Now comes the fun part: dragging and dropping predefined flow steps 🙂

Navigate to your Cloud Integration service and create a new integration flow in a package of your choice.

Open the integration flow and go into the Edit mode.

Now, delete the Sender object as well as the Start element. Indeed, we will insert a start timer to start the integration flow on a regular basis.

  1. Click on the Event icon in the menu.
  2. Select the Timer start event and drop it in the flow.

Now configure the timer to start the integration flow on a regular basis, eg. every 10 minutes from 8:00 until 22:00.

Once this is done, add a “Request-Reply” flow step between the start and end of your integration flow.

  1. Click on the Call icon in the menu and select External Call.
  2. Click on Request-Reply and drop it on the flow.

With the connector tool, connect the start with the request-reply step, and the request-reply step to the end.

Add a Receiver to your integration flow, under the request-reply step.

Again, using the connector tool, connect the request-reply to the receiver. Chose the HTTP adapter when doing so since this is the call to the HTML page.

In the properties of the adapter, define the HTTP call as follows:

  1. Address: https://www.venicebeach-fitness.de/clubs/lifestyle-fitness-plus/heppenheim.html
  2. Method: GET
  3. Authentication: None

Note that this is where you enter the URL of any website you want to get information from – in this case it’s the URL of a local fitness club.

Also: you may need to import the SSL certificate from the website into your Cloud Integration tenant, which is done by downloading it from your browser and importing it in the security artefacts of Cloud Integration.

Let’s now get the information we are interested in from the HTML page. To do so, I’ll simply use standard string functionalities of Groovy. No doubt that you can do better than that: my purpose is just to illustrate a simple use case of the integration suite, not to explain Groovy best-practices in detail 😉

Add a Groovy Script step into your integration flow from the Transformation / Script menu.

Position it after the request-reply step and beautify the flow.

Now add a new script resource using the contextual menu of your script flow step.

 

Paste the code below in the script you just generated.

In this script, we are looking a specific string (“sind noch <strong>”) from the HTML. Once we found it, we know that the information “amount of free spaces” is located just after it so we store the location (“index”) of the character.

Then, we create the JSON payload (we could have used JsonBuilder but creating a string was quicker) that the Google Sheet Open Connector is expecting, using the current day and time, as well as the substring located between the indexes calculated upfront.

I also logged the information just to make sure everything runs fine – but this is optional.

At the end of the script, we write the JSON payload to the “message” of the integration flow, which will then be sent to the Receiver (Google Sheet).

import com.sap.gateway.ip.core.customdev.util.Message;
import java.util.HashMap;

def Message processData(Message message) {
    // String to find: "In diesem Studio sind noch <strong>27 Plätze</strong> frei."
    def String payload = message.getBody(java.lang.String);
    def indexOfPlaetze = payload.indexOf('sind noch <strong>');
    indexOfPlaetze = indexOfPlaetze + 18;
    def indexOfPlaetzeEnd = indexOfPlaetze + 1;

    def String currentDateTime = new Date().format( 'dd.MM-HH:mm',TimeZone.getTimeZone('CET'));
    def String plaetze = payload[indexOfPlaetze..indexOfPlaetzeEnd];
    
    // Build payload for Google Sheets: {"values": ["22042021-0954","22"]}
    def OCNBody = "{\"values\": [\"" + currentDateTime + "\", " + plaetze + "]}";
    message.setBody(OCNBody);
    
    //Logging
    def messageLog = messageLogFactory.getMessageLog(message);
    messageLog.addAttachmentAsString("OCN body", OCNBody, "text/html");   
    
	return message;
}

Note that we may have used other libraries to parse the HTML file based on the DOM: but again, we are doing it quick and dirty.

Also: if you want to be a bit more proficient with writing your code, you can use the GroovyIDE tool. It lets you write and test your groovy code online, hence you don’t need to deploy your integration flow to test and debug. Very useful.

 

Last step: send the payload to Google Sheet. To do so, create a connection between the end event flow step and the Receiver system.

When prompted, select “OpenConnector” as adapter type.

In the properties of the adapter, enter the following details:

  1. The API URL from your CURL command until “api-v2”, eg.: https://api.openconnectors.trial.eu10.ext.hana.ondemand.com/elements/api-v2
  2. Credential Name: is the name of the credentials you have created at the very beginning, eg.: GoogleSheet_CREDS
  3. Resource: from the API URL from your CURL command, copy the part of the path starting with “/spreadsheets”, eg.: /spreadsheets/1yG7sh_P886BpPlCH8T3cJwwgGnHTcLV27s_HapDfh3o/worksheets/Data/rows
  4. Method: POST

One last technical thing: we are sending 2 HTTP headers twice which is confusing the Open Connector Adapter. Hence we need to remove these before hitting the OpenConnectot instance.

This is easily done through a Content Modifier. In the integration flow menu, click on the Transform and Content Modifier icon.

Position the content modifier just after the script step and beautify your flow.

In the content modifier properties, click on the Message Header tab.

Now add two headers to be deleted: Content-length and Transfer-encoding.

Your integration flow should look something like this:

That’s it! Now Deploy your integration flow. As soon as it is deployed, it will magically fill in your sheet!

 

Now it is up to you to build your graphical visualisation and to take care of emptying/extending the Google Spreadsheet, but I hope you got the main idea!

By the way, this is my little dashboard:

For more advanced techniques using the Cloud Integration service and the Google Sheet Open Connector, read the excellent blog of Saraj.

Stay safe!

 

Assigned Tags

      6 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Sunny Kapoor
      Sunny Kapoor

      Excellent blog Sven Huberti

      Author's profile photo Manuel Namyslo
      Manuel Namyslo

      no waiting queues anymore - thanks Sven!

      Author's profile photo Muniyappan Marasamy
      Muniyappan Marasamy

      Thanks for the blog. Simple and elegantly using all integration suite capabilities. Easy way would be write a google script and schedule it.

      Author's profile photo Sven Huberti
      Sven Huberti
      Blog Post Author

      Thanks for the comment!

      The most important thing here is to show a low/no-code integrative approach, orchestration and connectivity capabilities of the Integration Suite. Whilst a Google Script could have done the job too, you are much more flexible with a middleware in which you can add additional steps, additional receiver systems, substitute systems, etc..

      BR!

      Author's profile photo Saurabh Kabra
      Saurabh Kabra

      Hi Sven,

      First of all many thanks for such a nice use case which was easy to follow & implement.

      I have a question... I exactly hit the error while sending the HTTP request to open connector due to 2 headers being sent twice as mentioned in your blog. So my question is, How you came to know about these 2 exact headers because if I check the trace then I see exactly 1 header for each of them. Further, the returned error message from CPI does not talk about the duplicate headers or so. Hence curious to know how did you troubleshoot this?

       

      Thanks Again!

      Saurabh

      Author's profile photo Sven Huberti
      Sven Huberti
      Blog Post Author

      Hello Saurabh,

      thanks for your kind feedback!

      I actually see the in the monitoring the error message(s) very clearly:

      Not sure what your monitoring is telling you but it was pretty obvious in my environment.

      Thanks and best regards!

      Sven