Enter Semantics: An Expert System for Automated Mo...

ttrapp · ‎09-14-2010

In this weblog series I want to show how to build an expert system for automated monitoring. For the sake of simplicity I decided to sketch how to classify messages of the Business Application Log of an SAP system. In the Enter Semantics: An Expert System for Automated Monitoring – Part 2 I showed that ontologies can be used for error analysis because of their capabilities to perform classifications. Now I show how wikis can be used to maintain knowledge databases.

What's the problem with Semantic Technology?

People who advocate for semantic technology often tell that they will lead to something which is called "Web 3.0" - a internet that is accessible both for humans and computers. This will lead to fast search and mashups of different applications. Sceptics of Web 3.0 approach usually ask where those cool new semantic killer apps are. The answer to this question is easy: today there are only a few apps using semantic technology but there is a constant trend to empower existing apps with metadata - think of standards like RSS, UDDI and so on. In my opinion the reason is simple: today'sWeb 2.0 apps are so cool and easy to use that their users don't need semantic technology at the moment. Today we have everything we need: mashups of different applications, lightweight data exchange formats like JSON, AJAX frameworks, powerful search engines like Google and so on. So I think the main problem of semantic technologies is that their predessor technologies are from reaching their limits - in fact they are still evolving. So the problem of "Web 3.0" is that "Web 2.0" is so powerful and still expanding.

Perhaps semantic wikis are a good example:

In my opinion they are complicated to use in contrast to common wikis which can be used by anyone.
Semantic wikis are great if you want to more than categorization - but if the structure of meta data get more complicated (think or relational data models) maintenance in a semantic wiki gets complicated.

I think that today most semantic technologies (think of semantic wikis or tools like Protégé) are not that easy to use compared to their Web 2.0 "counterparts".

The key to success is data extraction

I wasn't much successful putting the data of the knowledge database into semantic wiki by using its special semantic features. So I decided to keep the knowledge base as text in a table element:

As a consequence I can easily keep and maintain the knowledge database in a wiki - together with lots of additional information: best practices what to do when this error occurs, description of additional information and so on. So the wiki contains formal as well as informal knowledge. The semantic features are an add on to existing and already very mature knowledge management tools.

And this could be the way to success: If we find a way to extract the data from a wiki into a knowledge database we can perform fast search operations and give users a starting point - and perhaps direct link to the solution of a problem.

Wiki extraction the Groovy way

At the moment I don't know any data mining technique which can analyse narrative texts but it is very easy to extract content form a wiki: You have to perform an HTTP post operation to your wiki which can be done easily as follows.

def url = new URL (http://localhost/mediawiki/index.php/Special:Export)
def conn = url.openConnection()
conn.setRequestMethod("POST")
conn.doOutput = true
Writer wr = new OutputStreamWriter(conn.outputStream)
wr.write(pages="Error_in_Transfer_of_FI-CA_Totals_Records_to_General_Ledger")
wr.flush()
wr.close()
conn.connect()

In this case the result is the content of a wiki page about a certain error situation dealing with an Error_in_Transfer_of_FI-CA_Totals_Records_to_General_Ledger which is indeed the title of the wiki page. The result is an XML document containing the narrative text in an XML element text which is a child element of /mediawiki/page/revision. Parsing the XML document, extracting the text and replacing linefeeds is done in Groovy only in two lines of code:

def mediawiki = new XmlParser().parse(conn.content.text)
def text = mediawiki.page.revision.text.text().replaceAll(/

/, '')

A part of this XML document is a table containing the error classification in mediawiki markup:

{| cellspacing="1" |+ BAL Protocol 
|-
| BAL Object<br> 
| BAL Subobject <br> 
| Message Class <br> 
| Message Number <br> 
| Type<br>
|-
| FICA| FPG1 | >U  | 425 | E
|-
| FICA | FPG1| F5 | 808 | E
|}

Extracting the content of the wiki table with header "BAL Protocol" into a linked list of list is a little bit more complicated so I decided to define an function containing the implementation:

def GetConditionsFromWikiPage( wikitext ){
      def result = []
      def tables = wikitext.split(\{\|)
      tables.each{ table -> 
            if (table =~ /BAL Protocol/){
                  def lines = table.split(\|\-)
                  lines.eachWithIndex{ line, i -> 
                        if (i >= 2 && line =~ "|{4}"){
                             result.push(line.split("\|")[1..5])
                        }
                  }
            }
      }
      return result
}

I don't want to go into details because I would have to explain constructs like regular expressionw and closures. Bit in my opinion Groovy syntax has so much syntactic sugar that you should give it a try. For me it was really surprising that consuming HTTP and working with XML can make really fun in Groovy compared to ABAP and - much worse - Java.

Summary

Data extraction from wikis is nothing new - in fact I already blogged about it Semantic Web Technologies Part 1 – SPARQL. In my opinion those techniques could be lead to next generation knowledge management techniques. Of course this Groovy skript above is just a quick hack, we'll more advanced techniques to keep structured data in wikis and to extract it.

If you want to give Groovy a try then I suggest to install a Groovy shell or install it on your eclipse installation. In http://groovy.codehouse.org/ you will learn how to do it. There are a lot of introductions to Groovy on the web - perhaps the following link is a good starting point: http://www.vogella.de/articles/Groovy/article.html .