Skip to Content
Author's profile photo Tom Fallwell

How to use a Base_UserDefined Transform in Data Services 4.x to return JSON Web Service data

Overview

The proliferation of JSON data available from web service based sources is enormous these days. So I wanted to capture some JSON web service data for a proof of concept via SAP Data Services, but was discouraged to find there was no simple way to do this.

 

Searching for a solution, there were a couple of posts that vaguely refer to using the Data Service Adapter SDK to build a special adapter.  But this seemed like a complicated route for something so simple as JSON.   However then I found out about Base_UserDefined transform.

 

In this article, I will use Base_UserDefined transform to return web service data using the “Per Collection” option.  I initially learned about Base_UserDefined from Kamal’s excellent article, http://scn.sap.com/community/data-services/blog/2013/01/31/how-to-use-userdefined-transform-in-sap-bodsbusiness-object-data-services.  But in that article it uses Base_UserDefined in a “Per record” mode.  My json request will return many records as a result so I need to use the “Per Collection” mode and it requires a different way to work with the Collection object of Base_UserDefined.  I could not find any articles or documentation on the “Per Collection” mode so I will help fill that gap with this article.

 

The Base_UserDefined transform simply provides an interface to the Python engine and language.   There is also a strong community of Python code one can get assistance with.

 

By the way, Python is an object-oriented programing language that is interpreted making it a very quick development tool.  But what is most helpful are the module and packages that are supported by an open source community.  It is rare that I do not find a module already out there already is written for the job I need to do.

 

How it’s done

In this example, I want to return data from a json data source maintained by the Consumer Financial Protection Bureau (my daughter is just starting college and the number of credit card offers she is already receiving has caused me to think “financial protection”).  Hense the choice of web service.

 

The url I will use is this: http://data.consumerfinance.gov/api/views.json

  1. Prerequisite – First you will need to download and copy the urllib2.py module to your \Program Files (x86)\SAP BusinessObjects\Data Services\DataQuality\python\lib\ folder. This should be on the server the job server is running on.

    urllib2.py is a module that handles web service requests.  You can find it here and then save it as a file called urllib2.py.
    http://svn.python.org/projects/python/branches/release22-maint/Lib/urllib2.py
    user defined02.png

  2. Now to the design work.  We need to create a source that provides the URL for the web service.  In my example I simply added a SQL Transform and wrote a SQL select statement with hardcoded url.
    Select statement I used is
    select ‘http://data.consumerfinance.gov/api/views.json‘ as JSONuri
  3. Next drag in to the Data Flow a Base_UserDefined transform from the Data Quality\User Defined folder in Transforms, add a Temp Table, and connect all three.
    user defined03.png
  4. Open the Base_UserDefined transform and in the Input tab of the transform, drag and drop Input column of the input schema to the Input Schema Column Name so as to map with the Transform Input Field Name.
    user defined04.png
  5. Now click on the Options tab and then click on Edit Options button.
    user defined05.png
  6. Select the Mode as Per Collection (this is key, one input record will create multiple output records) and then click on Python Expression Editor to display editor on the right.  You can paste the code directly in this window for now but we need to setup output fields via the “Launch Python Editor” button on the right.
    user defined06.png
    See get_json.py.zip  attachment for the python code.
  7. Notice there are thee tabs on left panel.  Click on the I/O Fields and then right click on Output Fields to select Insert.  Do this to create two output fields called complaint_id and complaint_name both as varchar 50.
    user defined07.png
  8. Click Ok twice to close the editor and then click on the Output tab.  From here you simply click on the check box next to each output field which will result in their mapping on the right panel. (I set complaint_id as a primary field but that is not required.  Just me being picky.)
    user defined08.png
  9. Now launch the job and you should see a result in template table like this:
    user defined09.png

 

Conclusion

Using Base_UserDefined transform opens up a whole new world of flexibility and productivity but leveraging Python.  In this case I did not need to write any code to handle the web service request or to parse json and now I can load a database of all US Consumer Complaint departments.

 

Tip:  It is tricky to debug a Python script within a Base_UserDefined transform, so I will often create a plain version of the script and run it from my own Python engine and then convert it to one that will work in Data Services.   I will often use a Python Print statement to display variables values which you will be able to see in the DS job log.

Assigned Tags

      14 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Arun Sasi
      Arun Sasi

      Hi Tom,

      This is simply brilliant. Appreciate your efforts to discover such a useful feature of base_userdefined transform.

      I read in one of the SCN posts about the usage of JSON web service to read Twitter tweet data. I am working on a POC where I need to access user's tweet data in to staging tables through JSON. For this I have created an application in Twitter and got the security tokens for securely accessing twitter data but I dont know how to use these tokens in the python code such that the JSON URL reads the data in BODS.

      Can you help me

      Regards

      Arun Sasi

      Author's profile photo Tom Fallwell
      Tom Fallwell
      Blog Post Author

      Thanks for the comment Arun.

      That is what I like about Python.  It islikely someone has already written a module to access Twitter securely.  Googling the words twitter and python returns a perfect hit for this.  This module appears to do exactly what you want..

      bear/python-twitter · GitHub

      Author's profile photo Arun Sasi
      Arun Sasi

      Thanks Tom!!

      It gets little bit tricky when you write the code inside the Base_UserDefined transform to access Twitter data.

      Regards
      Arun Sasi

      Author's profile photo Former Member
      Former Member

      Hi,

      I am working on building a user defined transform that lits files of a certain type in a directory specified by the user and the results to be put in a temporary table. I am using the per collection option in the python editor, however I am having issues getting the input values. Have done this before when I was doing per record by using record.GetField, however this does not seem to work when using per collection. Would you be able to help me with this?

      Author's profile photo Former Member
      Former Member

      Hello Tom-

      Must thank you for a very elegant solution you presented. I've been able to tweak it somehow to consume a GET service which returns a JSON file as response. However, my main issue is to get POST service working. Can you guide me somehow to extend the solution you presented to work for a POST operation ( to send relational data as a JSON input to a API)?

      Thanks in advance,

      Abhishek

      Author's profile photo Joshua Blythe
      Joshua Blythe

      The Python requests module makes short work of making rest api calls.

          import requests,json

          payload = {'logonkey'   : auth.logonkey,

                          'tokenkey'   : auth.tokenkey,

                          'fileid'     : fileId,

                          'newstatus'  : newStatus}

          response = requests.post('https://epfws.usps.gov/ws/resources/download/status',

                                   data={'obj' : json.dumps(payload)},

                                   verify=False)

      This endpoint updates the usps file server after downloading postal directories used by the address cleanse.  (I wish SAP would provide a rest api to download their directories)  🙁

      Author's profile photo Former Member
      Former Member

      Thanks Joshua. I tried using the requests module but it seems the DS instance of Python(which is pre-installed) doesn't support requests module. Do you know how we can install this module in DS to make it importable in python scripts(within BODS)?

      Author's profile photo Joshua Blythe
      Joshua Blythe

      Do a google search and follow the installation instructions. 

      Also the latest DS patch we installed looks to have finally updated the Python version to 2.7, so you may need to take that into consideration.

      Author's profile photo Former Member
      Former Member

      Outstanding work! Significant Contribution...

      Author's profile photo Former Member
      Former Member

      Could you please share  get_json.py.zip file again? It is not available at the link. Thank you.

      Author's profile photo Steve Yochum
      Steve Yochum

      Following these instructions I'm capturing only the last record in the URL. Is there something missing to capture all of the records in the URL?

      Author's profile photo Pedro Soares
      Pedro Soares

      How can I add basic authentication (user and password) to the request?

      Author's profile photo Dhanraj Andhale
      Dhanraj Andhale

      Ouststanding..!!  Really the above blog is very helpful to consuming JSON API but for large API structure it will taking time to develop the same.

      Thank you very much Tom..!!

      Author's profile photo Karel Turek
      Karel Turek

      I had no success with using the OData adapter. This article helped me to solve the problem in real-time. Thanks a lot for sharing, Tom!