Skip to Content
Technical Articles
Author's profile photo Prashanth Dsouza

Regular Expressions in SAP HANA

Hello Everyone! I am a SAP HANA Senior Consultant working closely with Enterprise HANA and Business Objects Reporting Solutions.
Today I would like to share an approach which one might take to identify patterns in each text using HANA Database Artifacts.

Requirement: Identify the Quarter mentioned in each Document Line Text and display KPIs against this derived quarter. This Document Line Text is maintained by the Business Users and it is difficult to determine in which format a Business User will maintain the Quarter in the Text.

Eg:

Examples%20of%20Quarter%20Formats%20in%20Document%20Line%20Text

Examples of Quarter Formats in Document Line Text

 

Challenge:
Since the Quarter Format is not uniformly maintained in the text how will you identify the correct Quarter on each line?

Solution:
We will address this challenge by leveraging Regular Expression Functions in a HANA Table Function.
To understand the code you will need some basic understanding on the Syntax of Regular Expressions. You can refer the ‘Useful References’ section at the end of my blog to gain a quick understanding.

High Level Algorithm:
1. Look at the text maintained by the users and identify all possible patterns (manual step).
2. Create pattern matching templates via REGEX functions.
3. Match each pattern against the text and identify used pattern via REGEX functions.
4. Extract the used pattern and Transform to one common format (eg: 21Q1).
5. Display KPIs against this DERIVED_QTR (by joining the Table Function to a Calculation View).

Please Note:
The below code contains more REGEX functions than you might actually require for this scenario. The only reason I have added them is so that you can understand how to use them and hopefully this will help you in your particular scenario.

I have provided comments which will help you understand what each line of code is doing.

Table Function Code:

FUNCTION "ZSCHEMA"."ZTF_DERIVED_QTR" ( ) 
	RETURNS TABLE
	(
		DOC_TYPE NVARCHAR(5),
		DOC_NUMBER NVARCHAR(10),
		DOC_ITEM NVARCHAR(4),
		DOC_LINE_TEXT NVARCHAR(5000),
		DOC_QTR_FORMAT NVARCHAR(10),
		DOC_QTR_VALUE NVARCHAR(10),
		DERIVED_QTR_VALUE NVARCHAR(10)
	)
	LANGUAGE SQLSCRIPT
	SQL SECURITY INVOKER AS
BEGIN
RETURN

SELECT DOC_TYPE,DOC_NUMBER,DOC_ITEM,
	--This is the field which contains the Quarter Format entered by the user
	DOC_LINE_TEXT,
	--Identify the Quarter Format/Pattern entered by the user on a given line and store it in DOC_QTR_FORMAT column
	case 
	when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4]FY[1-2][0-9]' then 'QNFFYY'	--eg:Q1FY21
	when DOC_LINE_TEXT LIKE_REGEXPR '[1-4]FY[1-2][0-9]' then 'NFFYY'	--eg:1FY21
	when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4].[1-2][0-9]' then 'QNSYY'	--eg:Q1'21
	when DOC_LINE_TEXT LIKE_REGEXPR '[1-2][0-9].Q[1-4]' then 'YYSQN'	--eg:21'Q1
	when DOC_LINE_TEXT LIKE_REGEXPR '[1-2][0-9]Q[1-4]' then 'YYQN'		--eg:21Q1
	when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4][1-2][0-9]' then 'QNYY'		--eg:Q121
	when DOC_LINE_TEXT LIKE_REGEXPR '[1-4]Q[1-2][0-9]' then 'NQYY'		--eg:1Q21
	end as DOC_QTR_FORMAT,
	--Locate the Quarter Format/Pattern and extract this Format/Pattern in DOC_QTR_VALUE column
	case
	when LOCATE_REGEXPR('Q[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('Q[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT)
	when LOCATE_REGEXPR('[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT)
	when LOCATE_REGEXPR('Q[1-4].[1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('Q[1-4].[1-2][0-9]' in DOC_LINE_TEXT)
	when LOCATE_REGEXPR('[1-2][0-9].Q[1-4]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('[1-2][0-9].Q[1-4]' in DOC_LINE_TEXT)
	when LOCATE_REGEXPR('[1-2][0-9]Q[1-4]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('[1-2][0-9]Q[1-4]' in DOC_LINE_TEXT)
	when LOCATE_REGEXPR('Q[1-4][1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('Q[1-4][1-2][0-9]' in DOC_LINE_TEXT)
	when LOCATE_REGEXPR('[1-4]Q[1-2][0-9]' in DOC_LINE_TEXT) > 1 then SUBSTR_REGEXPR('[1-4]Q[1-2][0-9]' in DOC_LINE_TEXT)
	end as DOC_QTR_VALUE,
	--Identify the Quarter Format/Pattern and Extract the Quarter Format in one Standard Format YYQN eg: 21Q1 
	case
	when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4]FY[1-2][0-9]' then substr(SUBSTR_REGEXPR('Q[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT),5,2)||substr(SUBSTR_REGEXPR('Q[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT),1,2)
	when DOC_LINE_TEXT LIKE_REGEXPR '[1-4]FY[1-2][0-9]' then substr(SUBSTR_REGEXPR('[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT),4,2)||'Q'||substr(SUBSTR_REGEXPR('[1-4]FY[1-2][0-9]' in DOC_LINE_TEXT),1,1)
	when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4].[1-2][0-9]' then substr(SUBSTR_REGEXPR('Q[1-4].[1-2][0-9]' in DOC_LINE_TEXT),4,2)||substr(SUBSTR_REGEXPR('Q[1-4].[1-2][0-9]' in DOC_LINE_TEXT),1,2)
	when DOC_LINE_TEXT LIKE_REGEXPR '[1-2][0-9].Q[1-4]' then substr(SUBSTR_REGEXPR('[1-2][0-9].Q[1-4]' in DOC_LINE_TEXT),1,2)||substr(SUBSTR_REGEXPR('[1-2][0-9].Q[1-4]' in DOC_LINE_TEXT),4,2)
	when DOC_LINE_TEXT LIKE_REGEXPR '[1-2][0-9]Q[1-4]' then SUBSTR_REGEXPR('[1-2][0-9]Q[1-4]' in DOC_LINE_TEXT)
	when DOC_LINE_TEXT LIKE_REGEXPR 'Q[1-4][1-2][0-9]' then substr(SUBSTR_REGEXPR('Q[1-4][1-2][0-9]' in DOC_LINE_TEXT),3,2)||substr(SUBSTR_REGEXPR('Q[1-4][1-2][0-9]' in DOC_LINE_TEXT),1,2)
	when DOC_LINE_TEXT LIKE_REGEXPR '[1-4]Q[1-2][0-9]' then substr(SUBSTR_REGEXPR('[1-4]Q[1-2][0-9]' in DOC_LINE_TEXT),3,2)||'Q'||substr(SUBSTR_REGEXPR('[1-4]Q[1-2][0-9]' in DOC_LINE_TEXT),1,1)
	end as DERIVED_QTR_VALUE
FROM
(
	SELECT 'PR' AS DOC_TYPE,PR_NUMBER AS DOC_NUMBER,PR_ITEM AS DOC_ITEM,DESCRIPTION AS DOC_LINE_TEXT
	FROM ZSCHEMA.PR_LINES
);

END;

 

Output of Table Function:

Table%20Function%20Output

Table Function Output

You can then join this Table Function with a Graphical Calculation View and display KPIs against the Derived Quarter.

Conclusion:
Pattern Matching can easily be done by using Regular Expressions in HANA. Keep in mind though that the performance of the Calculation View will degrade in direct proportion to the number of patterns to be matched and volume of data against which the pattern needs to be matched.

Hope this blog will help you when you face a similar situation.
Feel free to ask any questions. Thank You and have a great day!

 

Useful References:
Regular Expression Syntax Tutorial: https://regexone.com/
Regular Expression Information:
https://en.wikipedia.org/wiki/Regular_expression
https://www.regular-expressions.info/
SAP Help Documentation:
https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/LATEST/en-US/a2f80e8ac8904c13959c69bfc3058f19.html

Assigned tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Rajarshi Muhuri
      Rajarshi Muhuri

      great post , I quickly understood the LIKE_REGEXPR and LOCATE_REGEXPR and used it in my work .. thanks