The proliferation of JSON data available from web service based sources is enormous these days. So I wanted to capture some JSON web service data for a proof of concept via SAP Data Services, but was discouraged to find there was no simple way to do this.
Searching for a solution, there were a couple of posts that vaguely refer to using the Data Service Adapter SDK to build a special adapter. But this seemed like a complicated route for something so simple as JSON. However then I found out about Base_UserDefined transform.
In this article, I will use Base_UserDefined transform to return web service data using the “Per Collection” option. I initially learned about Base_UserDefined from Kamal’s excellent article, http://scn.sap.com/community/data-services/blog/2013/01/31/how-to-use-userdefined-transform-in-sap-bodsbusiness-object-data-services. But in that article it uses Base_UserDefined in a “Per record” mode. My json request will return many records as a result so I need to use the “Per Collection” mode and it requires a different way to work with the Collection object of Base_UserDefined. I could not find any articles or documentation on the “Per Collection” mode so I will help fill that gap with this article.
The Base_UserDefined transform simply provides an interface to the Python engine and language. There is also a strong community of Python code one can get assistance with.
By the way, Python is an object-oriented programing language that is interpreted making it a very quick development tool. But what is most helpful are the module and packages that are supported by an open source community. It is rare that I do not find a module already out there already is written for the job I need to do.
How it’s done
In this example, I want to return data from a json data source maintained by the Consumer Financial Protection Bureau (my daughter is just starting college and the number of credit card offers she is already receiving has caused me to think “financial protection”). Hense the choice of web service.
The url I will use is this: http://data.consumerfinance.gov/api/views.json
- Prerequisite – First you will need to download and copy the urllib2.py module to your \Program Files (x86)\SAP BusinessObjects\Data Services\DataQuality\python\lib\ folder. This should be on the server the job server is running on.
urllib2.py is a module that handles web service requests. You can find it here and then save it as a file called urllib2.py.
- Now to the design work. We need to create a source that provides the URL for the web service. In my example I simply added a SQL Transform and wrote a SQL select statement with hardcoded url.
Select statement I used is
select ‘http://data.consumerfinance.gov/api/views.json‘ as JSONuri
- Next drag in to the Data Flow a Base_UserDefined transform from the Data Quality\User Defined folder in Transforms, add a Temp Table, and connect all three.
- Open the Base_UserDefined transform and in the Input tab of the transform, drag and drop Input column of the input schema to the Input Schema Column Name so as to map with the Transform Input Field Name.
- Now click on the Options tab and then click on Edit Options button.
- Select the Mode as Per Collection (this is key, one input record will create multiple output records) and then click on Python Expression Editor to display editor on the right. You can paste the code directly in this window for now but we need to setup output fields via the “Launch Python Editor” button on the right.
See get_json.py.zip attachment for the python code.
- Notice there are thee tabs on left panel. Click on the I/O Fields and then right click on Output Fields to select Insert. Do this to create two output fields called complaint_id and complaint_name both as varchar 50.
- Click Ok twice to close the editor and then click on the Output tab. From here you simply click on the check box next to each output field which will result in their mapping on the right panel. (I set complaint_id as a primary field but that is not required. Just me being picky.)
- Now launch the job and you should see a result in template table like this:
Using Base_UserDefined transform opens up a whole new world of flexibility and productivity but leveraging Python. In this case I did not need to write any code to handle the web service request or to parse json and now I can load a database of all US Consumer Complaint departments.
Tip: It is tricky to debug a Python script within a Base_UserDefined transform, so I will often create a plain version of the script and run it from my own Python engine and then convert it to one that will work in Data Services. I will often use a Python Print statement to display variables values which you will be able to see in the DS job log.