It’s been 4 months since I published my last post on this topic. As promised, here’s the next post with some incremental information on this endeavor. Lets see how a simple UIMA annotator can be deployed to cloud as a REST service.
Information Extraction
First and foremost challenge in implementing any unstructured data analytics is information extraction. That’s where APACHE UIMA framework comes handy. It’s a Java based open source framework which can be used to develop complex components that can extract information from variety of unstructured data (text, voice, etc).
UIMA can analyze large volumes of text and extract information based on custom rules written as Annotators. A fairly large community is contributing towards creating and enhancing annotators that can identify words, nouns, phrases, sentiments, etc. You can download these annotators and process its output as per the requirements. UIMA framework is also flexible in terms of feeding output of one annotator to input of other. So, you can use whitespace annotator to tag each word in the text and then feed these words to regular expressions annotators that detect email addresses, URLS, phone numbers, ZIP codes, etc. These are called aggregate analysis engines.
So where does SAP Netweaver Cloud come in picture?. Being open standards based, it’s fairly simple to run an annotator on cloud as a rest service. In this post I will show how to deploy White Space annotator on cloud as rest service.
Broad outline on how this works-
Initial Setup:
Creating a PEAR file:
PEAR file is a standard package for UIMA components that can be distributed and reused. UIMA eclipse plugin provides you the option to add UIMA nature to your project and to create a PEAR file for your annotator using PEAR generation wizard. For complete documentation on PEAR generator please refer http://uima.apache.org/d/uimaj-2.3.1/tools.html#ugr.tools.pear.packager
On completion of the wizard you will get PEAR file generated for the UIMA project in the target directory provided.
Adding UIMA PEAR to SAP Netweaver Cloud Application:
If everything goes well- you will be able to deploy this web project to local server and Cloud. Application will look something like this-
This page also has following information about the service:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
No description provided
In order to use this service, a POST- or GET-request should be sent to the server with the following URL:
http://localhost:8080/WSTRest/
The following request parameters are expected:
POST request should be sent to use the service
Possible values:
GET request should be sent to obtain information about the service
Possible values:
If XML or inline-XML output is requested, it will contain the tags listed below. The XSD-definition of the output in XML-format can be downloaded here.
String text = "Hello Mr. John Smith !";
String parameters = "text=" + URLEncoder.encode(text, "UTF-8") + "&mode=inline";
URL url = new URL("http://localhost:8080/WSTRest/");
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
OutputStreamWriter writer = new OutputStreamWriter(connection.getOutputStream());
writer.write(parameters);
writer.flush();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
It neatly points out how to consume the service programmatically from other component or application.
For now, we will test the service with a simple html form that comes with this service:
Output of the query:
You can try out this service on cloud at:
https://wstrests0007950666trial.nwtrial.ondemand.com/WSTRest/?mode=form
Next Steps:
White space tokenizer forms the basics of any complex UIMA annotation engine. There are many out of the box annotators that are part of UIMA sandbox like –Dictionary Annotator, concept Mapper annotator, snowball annotator and Hidden Markov Model Tagger annotator. It will be interesting to see them working on cloud with output saved to HANA for analytics.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
11 | |
10 | |
7 | |
6 | |
4 | |
4 | |
3 | |
3 | |
3 | |
3 |