Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
remi_astier
Advisor
Advisor
Hello,

We've encountered several situations recently where it was required to acquire information from external systems that already had an API to expose data with web services.

The usual way would be writting a script or using an ETL, but those options had some disadvantages:

  • Provide obsolete data, unless scheduled frequently, but then it creates a workload to refresh a dataset that might not be used

  • Difficult to pass parameters and control from SAP HANA


With data federation in SAP HANA SDI, we can make virtual tables, meaning that they can be manipulated or used like regular tables, but no data is actually stored. When a query references a virtual table, the data is sent by an agent connected to the source system.

In the example below, we'll make a virtual table of 5 columns that is populated at every query by the content coming from a web service.

SAP HANA SDI SDK provides the ability implement any processing and formatting logic with an interface at the database level, making it seamless to embed in views, odata service or any HANA database artifact.

As usual, the documentation is always a good starting place, as well as some previous blog posts.

Factoring code to minimize effort of adding new web services


After browsing through the code of the adapters published on github, it was a bit surprised by the amount of functions exposed, allowing to control most aspects of the behavior of the adapter. My situation required to collect data from multiple web services, and I didn't want to write each time all mandatory adapter functions so I wrote a few abstract classes to factor in some tasks:

  • expose data source parameters to the data provisioning UI in SAP HANA

  • make web services appear as virtual tables or virtual functions

  • send rows to SAP HANA

  • And some minor tasks: initializing an http client, json parsing


Here's a useful class diagram, the adapter base class a few methods that must be implemented to create a working adapter:



The two dotted lines represent class inheritance. So for every web service functionnalty there are three parts to write :

  1. Describe input parameters and result set format

  2. Make the Request

  3. Transform the output into a result set


Example with a file hosted on a public web server


Here's an example that for data source without copyright, a customer needed to fetch some financial information from company filings available publicly on the website of the SEC.

There is a file called master index that is updated regularly. Instead of scripting a download, transformation and import of the file, we'll make the content of the file available as a database table.

The content starts with 11 lines of header then the values are separated by pipes :








Description: Master Index of EDGAR Dissemination Feed
Last Data Received: June 27, 2018
Comments: webmaster@sec.gov
Anonymous FTP: ftp://ftp.sec.gov/edgar/
Cloud HTTP: https://www.sec.gov/Archives/




CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000045|NICHOLAS FINANCIAL INC|10-K|2018-06-27|edgar/data/1000045/0001193125-18-205637.txt
1000045|NICHOLAS FINANCIAL INC|3|2018-04-02|edgar/data/1000045/0001736213-18-000001.txt
1000045|NICHOLAS FINANCIAL INC|3|2018-06-27|edgar/data/1000045/0001357521-18-000003.txt
1000045|NICHOLAS FINANCIAL INC|4|2018-04-27|edgar/data/1000045/0001609591-18-000002.txt
1000045|NICHOLAS FINANCIAL INC|8-K|2018-04-04|edgar/data/1000045/000119312 5-18-106701.txt



We will call this webservice SED Edgar Master

Webservice Description


We implement the abstract class WebserviceConfig in SECEdgarMasterService. In this case, there is no input parameters, we simply define the webservice as a virtual table of 5 columns.
public class SECEdgarMasterService extends WebserviceConfig{

final static String DESC="Master Index of EDGAR Dissemination Feed";

public SECEdgarMasterService() {
super("SEC_EDGAR_MASTER",
DESC,
new SECEdgarMasterRequest(),
new SECEdgarMasterResponse());
}

@Override
public Metadata getMetadata() throws AdapterException {
TableMetadata tm=new TableMetadata();
tm.setName(name);
tm.addDescription(AdapterLang.ENGLISH, DESC);
ArrayList<Column> cols = new ArrayList<Column>();

cols.add(new Column("CIK", DataType.INTEGER));
cols.add(new Column("COMPANY_NAME", DataType.NVARCHAR, 128));
cols.add(new Column("FORM_TYPE", DataType.NVARCHAR, 20));
cols.add(new Column("DATE_FILED", DataType.NVARCHAR, 10));
cols.add(new Column("FILE_NAME", DataType.NVARCHAR, 128));

tm.setColumns(cols);
return tm;
}

@Override
public NodeType getNodeType() {
return NodeType.TABLE;
}

}

Making the Request


We implement the abstract class WebserviceRequestHandler in SECEdgarMasterRequest. It consists of opening the URL and returning the result as a CloseableHttpResponse

 
public class SECEdgarMasterRequest extends WebserviceRequestHandler{

final static String URL="https://www.sec.gov/Archives/edgar/full-index/master.idx";

@Override
public CloseableHttpResponse exec(Metadata func, HashMap<String, String> connProps)
throws AdapterException, ClientProtocolException, IOException, URISyntaxException {

CloseableHttpClient httpclient=WebserviceRequestHandler.getDefaultHttpClient();

HttpGet httpget = new HttpGet(URL);
CloseableHttpResponse resp = httpclient.execute(httpget);
logger.debug("HttpAuthRequest finished for "+URL);
return resp;
}

@Override
public String getURL(Metadata func) throws AdapterException {
return URL;
}

}

Formatting the Result


We implement the abstract class WebserviceResponseHandler in SECEdgarMasterResponse. It consumes the output of the http request line by line. After skipping the header, it builds a list of records and returns it. The WebserviceAdapter class will then handle sending the records to SAP HANA in bulk.

The inner class SECEdgarMasterRecord must implement the method "appendTo" that is called to send records to SAP HANA.
public class SECEdgarMasterResponse extends WebserviceResponseHandler {

static final int SKIP_LINES=12;
static final String FIELD_SEPARATOR="\\|";

@Override
public LinkedList<WebserviceResponseRecord> handleResponse(CloseableHttpResponse r, String url)
throws AdapterException, ParseException, IOException {

BufferedReader in = new BufferedReader(new InputStreamReader(r.getEntity().getContent()));
String line;
String[] fields;

LinkedList<WebserviceResponseRecord> res = new LinkedList<WebserviceResponseRecord>();

for(int i=0;i<SKIP_LINES;i++)
in.readLine();

while((line = in.readLine()) != null) {
fields=line.split(FIELD_SEPARATOR);
if(fields.length==5){
res.add(new SECEdgarMasterRecord(Integer.parseInt(fields[0]),
fields[1],
fields[2],
fields[3],
fields[4]
));
}
else {
logger.debug("strange row: "+line);
}

}
in.close();
return res;
}

/**
* Describes a tag
* @author I063909
*
*/
public static class SECEdgarMasterRecord extends WebserviceResponseRecord{

int cik;
String cName;
String fType;
String dt;
String fName;
public SECEdgarMasterRecord(int cik, String cName, String fType, String dt, String fName){
this.cik=cik;
this.cName=cName;
this.fType=fType;
this.dt=dt;
this.fName=fName;
}
@Override
public void appendTo(AdapterRowSet rows) throws AdapterException {
AdapterRow row=rows.newRow();
row.setColumnValue(0, cik);
row.setColumnValue(1, cName);
row.setColumnValue(2, fType);
row.setColumnValue(3, dt);
row.setColumnValue(4, fName);
}

}
}

The results


And that's it for coding ! Then we build the adapter, deploy the jar at an exsiting data provisioning agent as a custom adapter, and register the adapter in SAP HANA. When browsing the remote source associated with the adapter, we can see 4 webservices exposed as virtual tables, including our example one.



We then create a virtual table pointing to the SEC_EDGAR_MASTER table, and we can start to explore the content of the table.



Fetching the first row of the virtual table takes 3 seconds, and fetching all 227 000 rows takes 3.2 seconds.

Conclusion


After abstracting the low level methods into a few classes, developing database interface for web services was fairly easy. We present the data using only SAP HANA and a Data Provisioning Agent, no ETL, no scheduling and no flat file imports.

In the future, I will likely post another adapter, a more advanced one using input parameters.

All source files are on github
6 Comments