Skip to Content

My previous blog post was on- how to deploy a UIMA annotator on HCP as REST service. I have been playing around with various UIMA annotators that do some wonderful stuff extracting information from unstructured data. I started with concept mapper which finds concepts in the source text by comparing it with concept dictionary loaded in memory. It helps in identifying and enriching the concepts of your interest. It’s been widely used in medical field to analyze medical records and patient history where medical terms and its properties are well documented (National Library of Medicine).

 

I was thinking hard to find an application for concept mapper running on HCP that could demonstrate its strength. I started with SAP HCP glossary as source of dictionary terms and was playing with some blog text to find the named entities. Idea was to classify blogs based on glossary terms. Like – find all the blogs which are related to Cloud connectivity services, blogs related to document services etc. However, I was not very happy with the dictionary based on glossary terms. Results received from Concept mapper based on this dictionary had many problems. I had to abandon this idea for a newer one- finding person names in text.

 

NameFinder service is implemented as a REST service running on Netweaver cloud. You can try it at-

https://namefinders0007950666trial.nwtrial.ondemand.com/uima-simple-server-concept/?mode=form

 

Enter some sample text with person names in it. Here some content taken from wiki.

namefinder 1.JPG

Submit the text to NameFinder annotator. You will see result with <NameAnnotation> xml tags.

 

namefinder 2.JPG

NameFinder annotator in based on OpenNLP toolkit that uses machine learning process for analyzing natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.

 

This rest service can be easily consumed using following SAUI5 javascript code to show the NameAnnotation in a table–

              var oModel = new sap.ui.model.xml.XMLModel();

              $.ajax({

                  url: http://localhost:8080/uima-simple-server-concept/,

                  type: ‘POST’,

                  data: ‘text=’+InputText+‘&mode=inline’,

                  dataType: “xml”,

                  success: function(xml) {

                     oModel.setData(xml);

                     sap.ui.getCore().byId(“cTable”).setModel(oModel);

                     sap.ui.getCore().byId(“cTable”).bindRows(“/NameAnnotation”); 

                  }

 

Next challenge was to find a good source of unstructured data that can be programmatically fed to NameFinder REST api. ‘scnReader’ project by Tom Van Doorslaer which is documented in this blog came handy. Thank you Tom; for your wonderful work here. In no time, I was able to import this SAPUI5 project into eclipse and add my own code to read RSS feed items and pass it to NameFinder REST api and show the names in a table.

 

I added couple of input fields to SCN reader to take any RSS feed link and number of items to fetch.

scnReader 0.JPG

As a sample, I used NetWeaver Cloud Developer Center RSS feed for blogs.

 

RSS link.JPG

On entering RSS feed link and submitting – it fetches recent 10 items and shows in the table.

scnReader 1.JPG

 

Now we have blog list with its content that can be sent to NameFinder REST service for analysis. On clicking ‘Get Names’ following result in shown in ‘Identified English Names’ table:

scnReader 2.JPG

On scrolling down-

names scroll1.jpg

Further scroll down-

name scroll2.jpg

names scroll3.jpg

Total 18 names are identified from recent 10 blog posts in Cloud community. With some more javascript, number of occurrences of a name can be calculated for comparison. With some efforts, these results can be persisted into database (HANA?) with some additional information like blog category(tags), blog link, results from concept mapper, date and time information, etc to come with some real world application.

 

So now you know who is famous on SAP netweaver cloud community. Some of the NameAnnotations are not person names. These miss hits are due to machine learning algorithm and name finder model.  These models can be enriched and algorithm can be retrained using OpenNLP library to improve the accuracy.

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. L. van Hengel

    Hi Rahul,

    Nice blog and great example of using Apache UIMA. You also used a good source of unstructured data in your application. I will checkout your previous blogs as well interesting stuff.

    Cheers,

    Leo

    (0) 
    1. Rahul Aware Post author

      Thanks Tom. Your work saved lot of efforts which otherwise I would have spent to figure out how to access text data from blog posts using SAPUI5. Also, I loved the way you structured your application into MVC paradigm. It was easy to extend it.

      (0) 
  2. Chris Paine

    Hi Rahul,

    A very interesting blog. Have you found many “real world” application for UIMA using Neo?

    I have to admit, it’s something like HANA for me, something I want to learn more about, but am having a little difficulty imagining where it would play a part in the HCM space (perhaps analysis of performance review comments… I don’t know!)

    And love that Joanna Chan got in there so many times 😆 , although how she could score above Cookies, and Roy Fielding I don’t know 😉

    Thanks for sharing!

    (0) 
    1. Rahul Aware Post author

      Hello Chris,

      I am also sailing in the same boat. Its just that I am trying to decipher this Treasure map and maintain the right direction. There’s many jackpots filled with gold coins on the map in this direction. They also call it – Big Data!

      But, as of now- its pretty cloudy!

      regards,

      🙂

      (0) 

Leave a Reply