Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
prashujust4u
Participant
Hello Everyone! I am a SAP HANA Senior Consultant working closely with Enterprise HANA and Business Objects Reporting Solutions.
Today I would like to share an approach which one might take to identify patterns in each text using HANA Database Artifacts.

Requirement: Identify the Quarter mentioned in each Document Line Text and display KPIs against this derived quarter. This Document Line Text is maintained by the Business Users and it is difficult to determine in which format a Business User will maintain the Quarter in the Text.

Eg:


Examples of Quarter Formats in Document Line Text


 

Challenge:
Since the Quarter Format is not uniformly maintained in the text how will you identify the correct Quarter on each line?

Solution:
We will address this challenge by leveraging Regular Expression Functions in a HANA Table Function.
To understand the code you will need some basic understanding on the Syntax of Regular Expressions. You can refer the ‘Useful References’ section at the end of my blog to gain a quick understanding.

High Level Algorithm:
1. Look at the text maintained by the users and identify all possible patterns (manual step).
2. Create pattern matching templates via REGEX functions.
3. Match each pattern against the text and identify used pattern via REGEX functions.
4. Extract the used pattern and Transform to one common format (eg: 21Q1).
5. Display KPIs against this DERIVED_QTR (by joining the Table Function to a Calculation View).

Please Note:
The below code contains more REGEX functions than you might actually require for this scenario. The only reason I have added them is so that you can understand how to use them and hopefully this will help you in your particular scenario.

I have provided comments which will help you understand what each line of code is doing.

Table Function Code:
FUNCTION "ZSCHEMA"."ZTF_DERIVED_QTR" ( ) 
RETURNS TABLE
(
DOC_TYPE NVARCHAR(5),
DOC_NUMBER NVARCHAR(10),
DOC_ITEM NVARCHAR(4),
DOC_LINE_TEXT NVARCHAR(5000),
DOC_QTR_FORMAT NVARCHAR(10),
DOC_QTR_VALUE NVARCHAR(10),
DERIVED_QTR_VALUE NVARCHAR(10)
)
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER AS
BEGIN
RETURN

SELECT DOC_TYPE,DOC_NUMBER,DOC_ITEM,
--This is the field which contains the Quarter Format entered by the user
DOC_LINE_TEXT,
--Identify the Quarter Format/Pattern entered by the user on a given line and store it in DOC_QTR_FORMAT column
case
when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4]FY[1-2][0-9]' then 'QNFFYY' --eg:Q1FY21
when DOC_LINE_TEXT LIKE_REGEXPR '[1-4]FY[1-2][0-9]' then 'NFFYY' --eg:1FY21
when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4].[1-2][0-9]' then 'QNSYY' --eg:Q1'21
when DOC_LINE_TEXT LIKE_REGEXPR '[1-2][0-9].Q[1-4]' then 'YYSQN' --eg:21'Q1
when DOC_LINE_TEXT LIKE_REGEXPR '[1-2][0-9]Q[1-4]' then 'YYQN' --eg:21Q1
when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4][1-2][0-9]' then 'QNYY' --eg:Q121
when DOC_LINE_TEXT LIKE_REGEXPR '[1-4]Q[1-2][0-9]' then 'NQYY' --eg:1Q21
end as DOC_QTR_FORMAT,
--Locate the Quarter Format/Pattern and extract this Format/Pattern in DOC_QTR_VALUE column
case
when LOCATE_REGEXPR('Q[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('Q[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT)
when LOCATE_REGEXPR('[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT)
when LOCATE_REGEXPR('Q[1-4].[1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('Q[1-4].[1-2][0-9]' in DOC_LINE_TEXT)
when LOCATE_REGEXPR('[1-2][0-9].Q[1-4]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('[1-2][0-9].Q[1-4]' in DOC_LINE_TEXT)
when LOCATE_REGEXPR('[1-2][0-9]Q[1-4]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('[1-2][0-9]Q[1-4]' in DOC_LINE_TEXT)
when LOCATE_REGEXPR('Q[1-4][1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('Q[1-4][1-2][0-9]' in DOC_LINE_TEXT)
when LOCATE_REGEXPR('[1-4]Q[1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('[1-4]Q[1-2][0-9]' in DOC_LINE_TEXT)
end as DOC_QTR_VALUE,
--Identify the Quarter Format/Pattern and Extract the Quarter Format in one Standard Format YYQN eg: 21Q1
case
when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4]FY[1-2][0-9]' then substr(SUBSTR_REGEXPR('Q[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT),5,2)||substr(SUBSTR_REGEXPR('Q[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT),1,2)
when DOC_LINE_TEXT LIKE_REGEXPR '[1-4]FY[1-2][0-9]' then substr(SUBSTR_REGEXPR('[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT),4,2)||'Q'||substr(SUBSTR_REGEXPR('[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT),1,1)
when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4].[1-2][0-9]' then substr(SUBSTR_REGEXPR('Q[1-4].[1-2][0-9]' in DOC_LINE_TEXT),4,2)||substr(SUBSTR_REGEXPR('Q[1-4].[1-2][0-9]' in DOC_LINE_TEXT),1,2)
when DOC_LINE_TEXT LIKE_REGEXPR '[1-2][0-9].Q[1-4]' then substr(SUBSTR_REGEXPR('[1-2][0-9].Q[1-4]' in DOC_LINE_TEXT),1,2)||substr(SUBSTR_REGEXPR('[1-2][0-9].Q[1-4]' in DOC_LINE_TEXT),4,2)
when DOC_LINE_TEXT LIKE_REGEXPR '[1-2][0-9]Q[1-4]' then SUBSTR_REGEXPR('[1-2][0-9]Q[1-4]' in DOC_LINE_TEXT)
when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4][1-2][0-9]' then substr(SUBSTR_REGEXPR('Q[1-4][1-2][0-9]' in DOC_LINE_TEXT),3,2)||substr(SUBSTR_REGEXPR('Q[1-4][1-2][0-9]' in DOC_LINE_TEXT),1,2)
when DOC_LINE_TEXT LIKE_REGEXPR '[1-4]Q[1-2][0-9]' then substr(SUBSTR_REGEXPR('[1-4]Q[1-2][0-9]' in DOC_LINE_TEXT),3,2)||'Q'||substr(SUBSTR_REGEXPR('[1-4]Q[1-2][0-9]' in DOC_LINE_TEXT),1,1)
end as DERIVED_QTR_VALUE
FROM
(
SELECT 'PR' AS DOC_TYPE,PR_NUMBER AS DOC_NUMBER,PR_ITEM AS DOC_ITEM,DESCRIPTION AS DOC_LINE_TEXT
FROM ZSCHEMA.PR_LINES
);

END;

 

Output of Table Function:


Table Function Output


You can then join this Table Function with a Graphical Calculation View and display KPIs against the Derived Quarter.

Conclusion:
Pattern Matching can easily be done by using Regular Expressions in HANA. Keep in mind though that the performance of the Calculation View will degrade in direct proportion to the number of patterns to be matched and volume of data against which the pattern needs to be matched.

Hope this blog will help you when you face a similar situation.
Feel free to ask any questions. Thank You and have a great day!

 

Useful References:
Regular Expression Syntax Tutorial: https://regexone.com/
Regular Expression Information:
https://en.wikipedia.org/wiki/Regular_expression
https://www.regular-expressions.info/
SAP Help Documentation:
https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/LATEST/en-US/a2f80e8ac8904c13959c69bfc3...
1 Comment
Labels in this area