Hello Folks,

In the previous blog,we have seen on how to use CGUL rules in HANA,

SAP HANA:  Using Custom dictionary with CGUL rules for Text Analytics

In this blog, we will see few more scenarios and examples in dynamic pattern search.

Scenario:

We have a table with data like below :

39.PNG

In the above example data, you can see email data related to purchases and we actually have entire email stored in the table. What we want to capture using text index is “Transaction ID : <aplhanum> ”  ( Ex: Transaction ID: 29606132P6328135B)

Solution :

Now if you would want to search for Transaction  ID or Invoice number instead of your sql queries going through the entire CLOB data to find out them , We can store the  data of our interest in FULL Text index which can be achieved using CGUL rules.

If we add a entry in dictionary ( i.e hdbtextdict ) file, we will be able to search for the term “Transaction ID” but we will only get the same in the results as well as shown below:

Dictionary Definition:

34.PNG

38.PNG

So there are 3 entries in the table where we have “Transaction ID” mentioned but what we really wanted is to capture the transaction id term along with the value. For this we can use the regular expression search capability of CGUL Rules.

We can define the .hdbtextinclude file as shown below :

We want to search is the token which is matching with “Transaction ID” and get the number following the same.

The sample format is like below :


Transaction ID: 29606132P6328135B




We have a space between “Transaction” ( Token 1 )  and “ID” ( Token 2 ) followed by “:”  (Token 3 ) and then the Alpha Numeric pattern ( Token 4 )

so we are basically trying to search against 4 tokens at once.


So the .hdbtextinclude definition should do the below steps:


1) Search for Token 1 : “Transaction” 

    <\p{ci}(TRANSACTION)>

The above definition will search for token with value “TRANSACTION” and \p {ci} helps to search for the token with case insensitive


2) Search for Token 2 : “ID”

<\p{ci}(ID)>

The above definition will search for token with value “ID” and \p {ci} helps to search for the token with case insensitive

3) Search for Token 3 : “:”

<(\:|#)?>

The above definition will search for token with value “:” or “#” and ‘?’ will make this token as optional


4) Search for Token 4 : “Aplhanumeric Pattern”

<[0-9A-Za-z]{1,50}>

The above definition will search for AlphaNumeric pattern of lenght max upto 50 ( if we know the max length of transaction id in the system )

As we need to search against all the 4 tokens at once , we can group the above definitions together as below  and activate .hdbtextrule file.


37.PNG


once we activtate the corresponding .hdbtextinclude file ( as mentioned in the previous blog linked here ) we can see below the CGUL rules has helped to capture the information we wanted and stored in the index:

35.PNG

Hence we are able to complete our scenario where we wanted to search for a particular token and the number following it ( Transaction ID : <alphanum> )

If you want to further query on top of index like below :

36.PNG

Hoping that blog is helpful for you.

Thanks for your time in reading this , do let me know your feedback on this.

Yours

Krishna Tangudu 🙂

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

Leave a Reply