SAP MDG 7.0 introduces an additional search provider that is based on SAP HANA and can be used for search and deduplication. It has some advantages over the search providers available before:

 

  • Improved performance and more meaningful  score
  • Powerful configuration with a lot of options to adopt the search behavior to your needs
  • Lower TCO if MDG runs on SAP HANA

 

This blog post will show you how your landscape will be simplified and reveal some of the advantages you will gain by using the SAP HANA fuzzy search functionality.

 

Prerequisites: Before you can take advantage of SAP HANA fuzzy search within MDG, you must prepare your system accordingly. The steps you will need to take for this are described in the article “Configuring SAP HANA-Based Search for MDG 7.0”, you will find a link at the end of this article.

 

At some point in time, you may find that the search behavior does not fit your or your end users’ expectations when they use the default settings. For example, additional, non-matching tokens may affect the score not as expected, or you want to deal with synonyms and stop words to improve the score to show what is expected for a particular field. For most of the cases, switches are available in the search rule set configuration, which can help tweak the search behavior to better meet the expectations.

 

In the following, I have chosen a simple example in order to illustrate how search rule sets in HANA are used and how you can change the configuration. In this example, I will deal with abbreviations and the consideration of tokens (matching tokens vs. non-matching).

 

A search for “L Croft” does not meet my expectations. It does not provide any results although I am quite sure that there is a change request in process that does deal with somebody whose last name is “Craft” and I am quite sure the first name started with an “L”:

/wp-content/uploads/2014/05/screen1_486756.png

Under the default configuration, the search algorithm does not take abbreviations for the field “MCNAME1” into account. This field is associated with the user interface field “Name1 / Last Name”.

UPDATE: for the Abbreviation Similarity, it is required to create a fulltext index on the related attribute. See “Tips” at the bottom of this page to see how to create fulltext indexes.

This is the case because the Abbreviation Similarity is currently set to zero:

/wp-content/uploads/2014/05/screen2b_486757.png

I will therefore change the Abbreviation Similarity to 0.9:

/wp-content/uploads/2014/05/screen3b_486711.png

This leads to a much better search result:

/wp-content/uploads/2014/05/screen4b_486712.png

Now, I am getting curious, I would like to check what other “Laras” exists in my system. Unfortunately, the search result does not meet my expectations at all:

/wp-content/uploads/2014/05/screen5b_486785.png

I have just seen a “Lara” out there, so why she is not considered in the search algorithm if I do not provide a last name? This might be related to the amount of tokens in the search. Let’s tell SAP HANA that the score of the best matching token should be emphasized. So how about changing the parameter Best Matching Token Weight to 0.9?

/wp-content/uploads/2014/05/screen6b_486786.png

Great, I got my “Lara” back. I now even get a higher and more meaningful score. And I also found “Mara Croft ltd.”, which sounds quite similar to “Lara”. This means the search does pay more attention to the things I am looking for, and does get less disrupted by additional information in the database. This is great for explorative search.

 

For a meaningful duplicate check, I might want to only get results that consider additional information as a relevant differentiation. Accordingly, “non-matching tokens” should lower the score. This can be achieved with the parameter Consider Non-Matching Tokens.

 

/wp-content/uploads/2014/05/screen7b_486792.png

By using SAP HANA as your database and search backend, you will also reduce your TCO, as you do not need to maintain an additional component like TREX or a third party search solution. In consistency with the HANA story, you do not even need to replicate your data in this setup, which simplifies the setup reasonably, considering replication and synchronization were a complex and sensitive topic before.

 

This was just a very simple example. There are plenty of additional parameters to tweak and tune the search behavior as desired. In case you got interested, you may want to have a look at the comprehensive chapter on fuzzy search available in the HANA Developer Guide:

http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf

 

 

Configuring SAP HANA-Based Search for MDG

 

http://help.sap.com/erp_mdg_addon70/helpdata/en/72/93f8516599a060e10000000a44176d/frameset.htm

HANA Search Developer Guide

see chapter 4.3.4.12 (Search Rules)

Tips

Debugging Search Rules:

To debug issues, use the HANA log information in addition to the MDG logging information. You can enable debug information for the relevant components via “Administration->Trace Configuration->Database Trace->edit->INDEXSERVER->Show All Components”. The relevant components for search rules are:

  • searchrulesetexecution
  • searchrulesetruntime
  • deduplication (for search rules in batch mode)

Likely you should set them all to DEBUG during debugging.

Using Text Features:

Most likely, sooner or later you are looking to use the text capabilities of fuzzy search within search rule sets. To do so, it is technically required that you either search on a text like field (datatype TEXT or SHORTTEXT) or alternatively create a fulltext index. In MDG, typically you need to create those indexes on a set of tables, as the generated rule sets are searching on a pair of tables (Active- and Stagingdata). To find out which generated tables of your data model belong to which active table you can navigate to SE38 and execute USMD_DATA_MODEL, select your data model and run the program.You can find more information on this here.

When you have identified the related tables and decided on which columns you want to use text features, you have to:

  1. create fulltext index on the related columns of both tables (active and staging)
  2. add “Text Column Options” to the related columns of your search rule set
  3. save and activate the search rule set again

ATTENTION: Adding an index to a table does usually have an impact on the cost of an insert. You can define the fulltext index with the “ASYNC” option, so that it does not burden that much the inserts into your active data tables.

Errors:

If you are face an error like the following during Search Rule execution:

Could not execute ‘CALL SYS.EXECUTE_SEARCH_RULE_SET(…..
SAP DBTech JDBC: [2]: general error: Defined key columns are not sufficient to yield a unique result!

Most likely your search rule set has not been generated with the option “NonUniqueKeys” set to “true”. This option is a property of the “View” definition within the search rule set. For more information refer the HANA Search Developer Guide.

Permissions:

If your search rules execution behaves differenty in HANA Studio compared to your ABAP environment, check the permissions and have a look into HANA log files as well (see above).

Performance improvements with indexes:

You can reasonable improve the overall performance of searches by defining a set of indexes which support the fuzzy search performance. Those indexes are:

  • fuzzy index (to lower the cost of fuzzy string comparisions)
    Create this index type on all fields which are frequently searched with fuzzyness. This does reasonably reduce the CPU consumption.
  • BTree-index (to improve the join performance)

You should create one index per column for both types of indexes.

ATTENTION: Adding an index to a table does usually have an impact on the cost of an insert. You can define the fulltext index with the “ASYNC” option, so that it does not burden that much the inserts into your active data tables.

To report this post you need to login first.

8 Comments

You must be Logged on to comment or reply to a post.

  1. Nikhilesh Agarwal

    Hi Roland  Very Nice Document ,

    I have a one question This Improve Search and Duplicate Check in  MDG7.0 is working only BP data Model or any other data models like MM & Some other Custom Data Model.

    Can you please let me know…

    Thanks

    Nikhilesh

    (0) 
    1. Roland Gremmelspacher Post author

      Hi Nikhilesh,

      the funktionality is not limited only to the BP data model, but to some extend developed more focus on that one. However with MDG 7.0 just the business partner, the flex and the financal models have been supported. With MDG 7.0 SP2 also material has been included (including classification search) and we now support the full scope of data models.

      Best Regards,

      Roland

      (0) 
  2. Amanda Andrade

    Great posting. I’m facing a similar point, my users don’t approve the default configuration. If we put the tax number we get the result with a lot of hits, however only one has 100% of similarity.

    (0) 
    1. Roland Gremmelspacher Post author

      Hi Amanda,

      Thank you for your positive feedback.I am not sure if there is a question in your comment or if you just want to mention your experiences. Guessing you are looking for a configuration where you can search with tax-number and just get the 100% matches, you should consider introducing a rule with just that single attribute tax number and put it at the top of your configuration (the field should be configured similar to the id-rule of bpmatch1 default configuration for BP. If you or your users don’t want to see matches with less 100% you can consider the rule-cut option. However, take care to distinguish between the slightly different requirements for search and dupe-check. For search, in most cases it is fine to have a longer list but on the other hand also false positives are not an issue if they appear at the end of the list. For dupe-check, you typically want to see all the matches derived via different rules, to get a comprehensive view. You dont like to cut the list and “hide” possible duplicates just because there is another one matching with tax, DnB or something else.

      Best regards,

      Roland

      (0) 
      1. Amanda Andrade

        Hi Roland,

        Thanks a lot for your feedback. In fact I have an issue to configure the Customer Search Rule in MDG on HANA. I’m getting the error

        Error in execution of query: general error: sql syntax error: incorrect syntaxnear “”: line 1 col 762 (at pos 762) at qp_gram.y:50778 when I Execute the search in MDG or even If I do a select statement in HANA. I do not want take your time in this posting, I will open a discussion in SCN to fix it. All material that I have found is related to Material Search, so it is not our case.


        However your posting really helped me a lot.


        Thank you so much!


        Regards

        Amanda

        (0) 
  3. Vinay Pawate

    Roland,

    We are on MDG 8.0 on HANA d/b  and have a requirement to configure  duplicate check on material master class and characteristics.  Is this supported ?

    (0) 

Leave a Reply