Configuring SAP HANA-Based Search for MDG

former_member193049 · ‎05-05-2014

SAP MDG 7.0 introduces an additional search provider that is based on SAP HANA and can be used for search and deduplication. It has some advantages over the search providers available before:

Improved performance and more meaningful score

Powerful configuration with a lot of options to adopt the search behavior to your needs

Lower TCO if MDG runs on SAP HANA

This blog post will show you how your landscape will be simplified and reveal some of the advantages you will gain by using the SAP HANA fuzzy search functionality.

Prerequisites: Before you can take advantage of SAP HANA fuzzy search within MDG, you must prepare your system accordingly. The steps you will need to take for this are described in the article “Configuring SAP HANA-Based Search for MDG 7.0”, you will find a link at the end of this article.

At some point in time, you may find that the search behavior does not fit your or your end users’ expectations when they use the default settings. For example, additional, non-matching tokens may affect the score not as expected, or you want to deal with synonyms and stop words to improve the score to show what is expected for a particular field. For most of the cases, switches are available in the search rule set configuration, which can help tweak the search behavior to better meet the expectations.

In the following, I have chosen a simple example in order to illustrate how search rule sets in HANA are used and how you can change the configuration. In this example, I will deal with abbreviations and the consideration of tokens (matching tokens vs. non-matching).

A search for “L Croft” does not meet my expectations. It does not provide any results although I am quite sure that there is a change request in process that does deal with somebody whose last name is “Craft” and I am quite sure the first name started with an “L”:

Under the default configuration, the search algorithm does not take abbreviations for the field “MCNAME1” into account. This field is associated with the user interface field “Name1 / Last Name”.

UPDATE: for the Abbreviation Similarity, it is required to create a fulltext index on the related attribute. See "Tips" at the bottom of this page to see how to create fulltext indexes.

This is the case because the Abbreviation Similarity is currently set to zero:

I will therefore change the Abbreviation Similarity to 0.9:

This leads to a much better search result:

Now, I am getting curious, I would like to check what other “Laras” exists in my system. Unfortunately, the search result does not meet my expectations at all:

I have just seen a “Lara” out there, so why she is not considered in the search algorithm if I do not provide a last name? This might be related to the amount of tokens in the search. Let’s tell SAP HANA that the score of the best matching token should be emphasized. So how about changing the parameter Best Matching Token Weight to 0.9?

Great, I got my “Lara” back. I now even get a higher and more meaningful score. And I also found “Mara Croft ltd.”, which sounds quite similar to “Lara”. This means the search does pay more attention to the things I am looking for, and does get less disrupted by additional information in the database. This is great for explorative search.

For a meaningful duplicate check, I might want to only get results that consider additional information as a relevant differentiation. Accordingly, “non-matching tokens” should lower the score. This can be achieved with the parameter Consider Non-Matching Tokens.

By using SAP HANA as your database and search backend, you will also reduce your TCO, as you do not need to maintain an additional component like TREX or a third party search solution. In consistency with the HANA story, you do not even need to replicate your data in this setup, which simplifies the setup reasonably, considering replication and synchronization were a complex and sensitive topic before.

This was just a very simple example. There are plenty of additional parameters to tweak and tune the search behavior as desired. In case you got interested, you may want to have a look at the comprehensive chapter on fuzzy search available in the HANA Developer Guide:

http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf

Configuring SAP HANA-Based Search for MDG

http://help.sap.com/erp_mdg_addon70/helpdata/en/72/93f8516599a060e10000000a44176d/frameset.htm

HANA Search Developer Guide

see chapter 4.3.4.12 (Search Rules)

Tips

Debugging Search Rules:

To debug issues, use the HANA log information in addition to the MDG logging information. You can enable debug information for the relevant components via "Administration->Trace Configuration->Database Trace->edit->INDEXSERVER->Show All Components". The relevant components for search rules are:

searchrulesetexecution

searchrulesetruntime

deduplication (for search rules in batch mode)

Likely you should set them all to DEBUG during debugging.

Using Text Features:

Most likely, sooner or later you are looking to use the text capabilities of fuzzy search within search rule sets. To do so, it is technically required that you either search on a text like field (datatype TEXT or SHORTTEXT) or alternatively create a fulltext index. In MDG, typically you need to create those indexes on a set of tables, as the generated rule sets are searching on a pair of tables (Active- and Stagingdata). To find out which generated tables of your data model belong to which active table you can navigate to SE38 and execute USMD_DATA_MODEL, select your data model and run the program.You can find more information on this here.

When you have identified the related tables and decided on which columns you want to use text features, you have to:

create fulltext index on the related columns of both tables (active and staging)

add "Text Column Options" to the related columns of your search rule set

save and activate the search rule set again

ATTENTION: Adding an index to a table does usually have an impact on the cost of an insert. You can define the fulltext index with the "ASYNC" option, so that it does not burden that much the inserts into your active data tables.

Errors:

If you are face an error like the following during Search Rule execution:

Could not execute 'CALL SYS.EXECUTE_SEARCH_RULE_SET(.....

SAP DBTech JDBC: [2]: general error: Defined key columns are not sufficient to yield a unique result!

Most likely your search rule set has not been generated with the option "NonUniqueKeys" set to "true". This option is a property of the "View" definition within the search rule set. For more information refer the HANA Search Developer Guide.

Permissions:

If your search rules execution behaves differenty in HANA Studio compared to your ABAP environment, check the permissions and have a look into HANA log files as well (see above).

Performance improvements with indexes:

You can reasonable improve the overall performance of searches by defining a set of indexes which support the fuzzy search performance. Those indexes are:

fuzzy index (to lower the cost of fuzzy string comparisions)
Create this index type on all fields which are frequently searched with fuzzyness. This does reasonably reduce the CPU consumption.

BTree-index (to improve the join performance)

You should create one index per column for both types of indexes.

ATTENTION: Adding an index to a table does usually have an impact on the cost of an insert. You can define the fulltext index with the "ASYNC" option, so that it does not burden that much the inserts into your active data tables.

Improved Search and Duplicate check in MDG 7.0

Configuring SAP HANA-Based Search for MDG

HANA Search Developer Guide

Tips

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win