Split table column value into multiple rows using ...

KonradZaleski · ‎05-14-2019

Introduction

In my previous post (Split string into multiple rows using SQL in SAP HANA) I was demonstrating how to split variable string value into multiple lines. In this post I would like to focus on the similar scenario, but this time the challenge will be to do the same not for variable but for table column. SAP HANA enables STRING_AGG function for concatenating values from multiple lines into single string, however doing opposite is quite challenging.

Scenario

In my scenario I will use my test table containing contact person details. Its structure is as follows:

CREATE COLUMN TABLE CONTACTS

(

	ID INT,

	COUNTRY VARCHAR(2),

	FULL_NAME VARCHAR(100),

	PHONE_NUMBERS VARCHAR(200)

);

Existing CONTACTS table has PHONE_NUMBERS column which stores comma delimited numbers. The purpose is to display phone numbers as separate rows and keep other contact information as in the source table. Single phone number length may vary and count of phone numbers can be also different for each record.

Split column values into multiple lines

To implement the logic for splitting column values into multiple lines, following steps are required

Retrieve from the string values delimited by separators

Dynamically define maximum number of values for all concatenated strings

Generate multiple lines for each record

Distribute retrieved values to generated rows

Step 1. Retrieve from the string values delimited by separators

In this step I will use string function, which allows to distinguish values from comma separated string. For this purpose I will use SUBSTR_REGEXPR SQL function. This function allows to retrieve substring from specific string based on regular expression. It also allows to specify which occurrence of the matching substring we want to display.

Following expression allows to retrieve first occurrence of the string of any characters excluding commas

SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1 )

Knowing that in my scenario there is up to 3 phone numbers concatenated within single value, let's add the expression for remaining numbers:

Query:

SELECT 

	"ID",

	"COUNTRY",

	"FULL_NAME",

	"PHONE_NUMBERS",

	SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1) AS "PHONE_NUMBER1",

	SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 2) AS "PHONE_NUMBER2",

	SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 3) AS "PHONE_NUMBER3"

FROM 

	CONTACTS;

Result:

Step 2. Dynamically define maximum number of values for all concatenated strings

In this step we want to define what is the maximum number of phone number values in single string. For this purpose I will use OCCURRENCES_REGEXPR function to count number of separators within the string.

Then I will add +1, because number of commas is always less by 1 than the number of phone numbers in the string:

OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1

Now we have count of phone number occurrences for each string.

Query:

SELECT 

	"ID",

	"COUNTRY",

	"FULL_NAME",

	"PHONE_NUMBERS",

	OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 AS "OCCURRENCES_COUNT"

FROM 

	CONTACTS;

Result:

Finally I want to see the maximum value to know, how many lines I need to generate. This value will be assigned to the variable MAX_NR_OCCURRENCES and will be used in Step 3. For the purpose of creating variable, I used anonymous block:

Query:

DO

BEGIN



	DECLARE MAX_NR_OCCURRENCES INT;

	

	SELECT

		MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )

	INTO

		MAX_NR_OCCURRENCES

	FROM

	 	CONTACTS;



END

Step 3. Generate multiple lines for each record

For generating multiple lines for each record I will cross join CONTACT table with series of 3 records (because in my case there are max 3 phone numbers in string). To generate N records I used SERIES_GENERATE_INTEGER function. Variable defined in Step 2 will be used as input parameter for this function to define number of records to be generated:

Query:

DO

BEGIN



	DECLARE MAX_NR_OCCURRENCES INT;

	

	SELECT

		MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )

	INTO

		MAX_NR_OCCURRENCES

	FROM

	 	CONTACTS;



	SELECT * FROM SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES); 	



END

Result:

Now let's cross join the series with result set from Step 1. This way each record will be copied 3 times:

Query:

DO

BEGIN



	DECLARE MAX_NR_OCCURRENCES INT;

	

	SELECT

		MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )

	INTO

		MAX_NR_OCCURRENCES

	FROM

	 	CONTACTS;



	SELECT 

		CNT."ID",

		CNT."COUNTRY",

		CNT."FULL_NAME",

		CNT."PHONE_NUMBERS",

		SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1) AS "PHONE_NUMBER1",

		SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 2) AS "PHONE_NUMBER2",

		SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 3) AS "PHONE_NUMBER3",

		SERIES."ELEMENT_NUMBER"

	FROM 

		CONTACTS CNT

		CROSS JOIN SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES) SERIES; 	



END

Result:

Step 4. Distribute retrieved values to generated rows

In this step we will apply final query adjustments, to move distinguished phone numbers from columns to consecutive rows. To achieve that we can use ELEMENT_NUMBER column from SERIES_GENERATE_INTEGER function, which returns consecutive numbers for each line within specific contact person. This column will be consumed by OCCURRENCE parameter. By having consecutive numbers in ELEMENT_NUMBER column we can dynamically substring separated values one by one, and display them in consecutive rows. We also need to remember that initially for each record we generated three lines, so at the end we also need to filter out empty rows (for these cases where there are less than 3 phone numbers in string)

Query:

DO

BEGIN



	DECLARE MAX_NR_OCCURRENCES INT;

	

	SELECT

		MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )

	INTO

		MAX_NR_OCCURRENCES

	FROM

	 	CONTACTS;



	SELECT 

		CNT."ID",

		CNT."COUNTRY",

		CNT."FULL_NAME",

		SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE SERIES."ELEMENT_NUMBER") AS "PHONE_NUMBER"		

	FROM 

		CONTACTS CNT

		CROSS JOIN SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES) SERIES 	

	WHERE

		SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE SERIES."ELEMENT_NUMBER") IS NOT NULL;

		

END

Result:

Summary

This post was to give proposed solution for splitting table column and generating multiple lines, but also to highlight the power of using string functions in combination with regular expressions.