Skip to Content
Author's profile photo Former Member

TREX and its application on SDN

Ever wondered, How results are retrieved when you search something on SDN? Well the solution lies in TREX: Search and Classification Engine. It is actually a technical component of SAP Netweaver which provides solutions with a wide range of funtions for intelligent search, retrieval and classification of textual documents. TREX can perform these functions on any file which has text in it. To name a few: HTML, XML, TXT, RTF, DOC (Microsoft Word) PPT (Microsoft PowerPoint). Also it supports numerous languages like Japanese, Hebrew etc etc…  In this blog, I will discuss about the basic search funtions of TREX and how these are used in SDN for searching. Let us take these functions one by one:

1. Exact Search:

This will return documents that contain exactly the search term or phrase that you entered. Suppose you are trying to gather information on code inspector of ABAP workbench, and your search phrase is “code inspector” (including quotation). This would return only those documents which have code inspector term in it. It wont return any document which contains only one of them and not other. Below is a snapshot of the above scenarioimage

2. Boolean Search:

You can use few logical operators like AND, OR, NOT to form more complex search queries from individual phrases. To differentiate these operators from our search terms, we write these operators in capital letters. The search terms between the Boolean operators can be single words or word groups or phrases. Now suppose, we want to search those documents which should contain the phrase code but not inspector. The query will then look like this: image We can also use nested logical expressions as our query. For example: code AND ( inspector OR optimizer ). Do not forget to put a space before and after each parenthesis.

3. Wildcard search

This is a normal wildcard search which we have been using since the early days of computing. It enables us to enter placeholders aka wildcard characters in the query. Just for refreshing memory, ‘?’ stands for a single character and ‘*’ for a sequence of characters.  Example: If our search query is co*, the engine will return all documents which have words code, company or for that fact any word starting from co.

Fuzzy Search – More interesting

This provides more flexibility and power to our search query. It matches those words also which have a similar spelling. Actually these new words are decided by an “editing distance”. These can be transformed to the search terms by a predecided number of character modifications. We can set this “editing distance” to bring more or less fuzziness. But the best is to maintain the fuzzy set value between 0.8 to 1.  For example: if we have set the value of “editing distance” as 1, and we searched for organization, it will return documents for both organization as well as organisation.   This feature is also used to implement “Do you mean”. Here is a snapshot:image

Linuguistic Search – Most interesting

This feature reduces the search term to its word stem and then searches for all variants of this root stem. A very good example for this is(taken from TREX documentation): A search for mice also finds mouse. A search for either mouse or catch can find a document containing the sentence “The traps caught the mice”. 

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member
      This blog gives a good insight into the strength of TREX. We all use TREX - but only the first two variants. Fuzzy and Linguistis searcg is particularly interesting.
      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Yeah, but fuzzy and linguistic search is done internally.
      Author's profile photo Former Member
      Former Member
      This is indeed a simple & useful piece of info. We have installed the TREX server & done the necessary configuration. In the cFolders application (browser), we want to search documents/folders/collaborations based on the TREX search engine. However when I give a word as a search criteria which is a part of the contents of a word document, the search results do not include the particular document. Does this mean, the search is not based on the TREX Search Engine? If this is true, then how does one ensure, that the search is based on TREX?

      A quick response here shall be highly appreciated.

      Rgds
      Deepak Umrankar

      Author's profile photo Former Member
      Former Member
      good
      Author's profile photo Former Member
      Former Member
      Now I can see the thread titles without all the RE:'s !!!!!!!!!!