Additional Blogs by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member
0 Kudos
Ever wondered, How results are retrieved when you search something on SDN? Well the solution lies in TREX: Search and Classification Engine. It is actually a technical component of SAP Netweaver which provides solutions with a wide range of funtions for intelligent search, retrieval and classification of textual documents. TREX can perform these functions on any file which has text in it. To name a few: HTML, XML, TXT, RTF, DOC (Microsoft Word) PPT (Microsoft PowerPoint). Also it supports numerous languages like Japanese, Hebrew etc etc...  In this blog, I will discuss about the basic search funtions of TREX and how these are used in SDN for searching. Let us take these functions one by one:
1. Exact Search:
This will return documents that contain exactly the search term or phrase that you entered. Suppose you are trying to gather information on code inspector of ABAP workbench, and your search phrase is "code inspector" (including quotation). This would return only those documents which have code inspector term in it. It wont return any document which contains only one of them and not other. Below is a snapshot of the above scenario
2. Boolean Search:
You can use few logical operators like AND, OR, NOT to form more complex search queries from individual phrases. To differentiate these operators from our search terms, we write these operators in capital letters. The search terms between the Boolean operators can be single words or word groups or phrases. Now suppose, we want to search those documents which should contain the phrase code but not inspector. The query will then look like this: We can also use nested logical expressions as our query. For example: code AND ( inspector OR optimizer ). Do not forget to put a space before and after each parenthesis.
3. Wildcard search
This is a normal wildcard search which we have been using since the early days of computing. It enables us to enter placeholders aka wildcard characters in the query. Just for refreshing memory, '?' stands for a single character and '*' for a sequence of characters.  Example: If our search query is co*, the engine will return all documents which have words code, company or for that fact any word starting from co.
Fuzzy Search - More interesting
This provides more flexibility and power to our search query. It matches those words also which have a similar spelling. Actually these new words are decided by an "editing distance". These can be transformed to the search terms by a predecided number of character modifications. We can set this "editing distance" to bring more or less fuzziness. But the best is to maintain the fuzzy set value between 0.8 to 1.  For example: if we have set the value of "editing distance" as 1, and we searched for organization, it will return documents for both organization as well as organisation.   This feature is also used to implement "Do you mean". Here is a snapshot:
Linuguistic Search - Most interesting
This feature reduces the search term to its word stem and then searches for all variants of this root stem. A very good example for this is(taken from TREX documentation): A search for mice also finds mouse. A search for either mouse or catch can find a document containing the sentence "The traps caught the mice". 
5 Comments