Flatten Parent-Child Hierarchy into Level Hierarchy using HANA (2.0 & above) Hierarchy Functions in SQL
Knock knock! Anyone else also looking for handy illustrations of the hierarchy functions introduced with HANA 2.0? Well count me in then ;).
While trying hard not to write SQLs with recursive self joins to flatten a hierarchical data format presented in parent-child relationship, the hierarchy functions in HANA can be a saviour for sure. Let’s look into something easy to implement using pre-defined hierarchy functions available with HANA 2.0 & above.
As a starter, let’s assume we have a miniature article hierarchy structured as below :
The data in is presented as parent-child hierarchical relationship in database. Let’s have a look at the same.
I have also introduced validity range in the data, which is relevant for time dependent hierarchies. For simplification purpose I have kept same ranges for all data records. The field HLEVEL is for illustration purpose & is not used in actual use case in the following exercise, the levels are generated later by the hierarchy functions based on the tree relationship.
However, our desired intention is to flatten this data into level based tabular structure, while article numbers being the lowest level ie. MATNR. Therefore we should have total 5 hierarchy levels for each MATNR. Since some of the articles don’t have level 5 hierarchical information, invariably the number of levels with non-blank information will be 4 in those cases. We also have an orphan node (FishingGear) with 2 levels of sub-nodes which are not placed in the actual hierarchy ARTHIER_01, so it would also be interesting to see what happens to those articles when we flatten the structure. 😉
We are trying to achieve a level oriented hierarchical structure like below :
In SQL server or similar DBMS one can use recursive joins with CTE to do these operations efficiently. Thankfully with HANA 2.0, a set of predelivered Hierarchy Functions have been introduced which can be used to approach such requirements efficiently & avoiding a lot of manual self joins between child & parent nodes to flatten out the structure. All detailed hierarchy function related descriptions can be found on the mentioned hyperlink above.
Now coming back to our case, let’s try to do some scripts using some of the HANA hierarchy functions to get what we want 🙂
For beginning, let’s forget the time dependency part of the data. Let’s assume we only have PARENTID & NODEID as columns in our dataset. Let’s see first how the code syntax looks like & then I will jump into basic explanation part.
with hier as ( select * from hierarchy ( source ( select "NODEID" as "NODE_ID", "PARENTID" as "PARENT_ID" from "ART_HIER_PAR_CH" ) start where nodeid = 'ARTHIER_01' sibling order by node_id multiparent orphan root cycle breakup ) ) select distinct node_id as "MATNR", substr_before(hier_struc,'~') as "ARTHIERLV1", substr_before(substr_after(hier_struc,'~'),'~') as "ARTHIERLV2", substr_before(substr_after(substr_after(hier_struc,'~'),'~'),'~') as "ARTHIERLV3", substr_before(substr_after(substr_after(substr_after(hier_struc,'~'),'~'),'~'),'~') as "ARTHIERLV4", substr_before(substr_after(substr_after(substr_after(substr_after(hier_struc,'~'),'~'),'~'),'~'),'~') as "ARTHIERLV5" from ( select node_id, hier_struc from hierarchy_ancestors_aggregate ( source hier measures ( string_agg(node_id, '~') as hier_struc ) ) where hierarchy_tree_size = 1 order by node_id );
Let’s now break down the code in parts for few necessary explanations. Firstly, the script starts with statement ‘ with hier as () ‘ in order to use the result set from temporary table ‘hier’ in later parts of the code, which is known as common table expression (CTE).
Notice in the CTE, we have only used 2 columns : NODEID & PARENTID from our source table : ART_HIER_PAR_CH in order to generate detailed hierarchical information using the first hierarchy function. Let’s have a look what result set the CTE does populate.
If we just take the first part consisting the CTE out of the entire script, this is the responsible part which generates the first hierarchical dataset using HIERARCHY generator function.
with hier as ( select * from hierarchy ( source ( select "NODEID" as "NODE_ID", "PARENTID" as "PARENT_ID" from "ART_HIER_PAR_CH" ) start where nodeid = 'ARTHIER_01' sibling order by node_id multiparent orphan root cycle breakup ) ) select * from hier;
Two very important aspects of this part are as below :
- In the current version of the function, to use the hierarchy generator function like above, the source must have two columns namely : NODE_ID & PARENT_ID. Parent-child relational columns with other names can not be processed by the hierarchy function. This is the reason I have used aliases to make PARENTID as PARENT_ID & NODEID as NODE_ID in the script. Because NODEID & PARENTID are simply not accepted by the function & it will raise an error like : SAP DBTech JDBC: : Hierarchy error: Source format error: No NODE_ID column found in SOURCE. No PARENT_ID column found in SOURCE.
- It is very important to state where the hierarchy formation should start from. It is the part where I mentioned the root node of my hierarchy specifying nodeid = ‘ARTHIER_01’.
Let’s have a look at the result set from temporary source table ‘hier’ from the above script.
Most of the columns are self explanatory or you can read about them in help.sap.com. However, some of the noticeable columns which are relevant to our case, are
HIERARCHY_TREE_SIZE : This denotes how many levels or branches to traverse from bottom of the hierarchy to reach the node, for leaves which don’t have any child node the value is 1. Meaning these are the article numbers in our sample data set. This expression is used in later part of our script.
HIERARCHY_IS_CYCLE : This is relevant in case a node is pointing to another predecessor node (higher in the hierarchy order) in the hierarchy tree (the value becomes 1 in this case), this causes an endless recursion unless intervened with an explicit expression like one used in this code ie. ‘cycle breakup’ (this was not needed for our sample dataset, also this is the default behaviour of the function usage, I have used the expression just for demonstration purpose). This ensures no further traversal continues first after the cycle is closed (value 1 is overwritten with 0).
HIERARCHY_IS_ORPHAN : This denotes if a specific branch is orphan to the hierarchy, In our case the branch starting with FishingGear is orphan & that’s the reason the generated data shows 1 for the same branch.
At the end of the CTE, notice these following three expressions :
Multiparenthood is the situation when in a hierarchy a specific node belongs to multiple parent nodes. This can be handled in different ways using following expressions :
multiparent error : This expression will raise an error if a multiparental situation occurs.
multiparent leaves : This expression ensures that multiparents are allowed only for leaves (with no child/successor node), otherwise raises an error.
multiparent : This expression permits multiparenthood in the hierarchy, which is also the default behaviour of the function usage. I have used the expression just for demonstration purpose.
I have already mentioned about orphan nodes & we also have an example in our case. The expresions relevant to orphan node handling are as following :
orphan error : An error is raised if any orphan node is present in the dataset.
orphan root : The top most level in orphan branch would be considered as a separate root node. I have used this expression in my script, will see how the data appears in later segment of the code.
orphan ignore : Orphan nodes are not considered for hierarchical data generation, this is the default behaviour.
orphan adopt : The top most level orphan node will be adopted as a successor node to the previous root node in the structure.
With all this information above, let’s now jump into the second segment of the script. This part is basically reusing the result set from temporary table ‘hier’ with another hierarchy navigation function : HIERARCHY_ANCESTORS_AGGREGATE. The details on this function can be found in the mentioned hyperlink. I have used aggregate function : STRING_AGG in the measures specification. the delimiter used here is ‘~’ symbol. This gives us the entire hierarchy traversal from root node to bottom across the tree for each node. The alias column hier_struc using this function generates the hierarchy path for a node like : level1~level2~level3~level4~level5~level6 (given there is data present for each level for the lowest node (level 6 being article number in our context). Let’s have a look at the sample data for this column :
An important thing to notice here is that it delivers the hierarchical path for each node & not only for the leaf nodes. So, all oher nodes would also be counted as MATNR in our script if we did not handle the script to work only for the leaf nodes. This is ensured by the where clause in last select statement : where hierarchy_tree_size = 1. This tells the script to derive the path only for the leaf nodes, where hierarchy tree size is 1.
The last task for us is to split this generated string into several levels of the hierarchical representation we want to achieve. Here one thing we must know is the maximum number of levels we want in our final dataset. In my example, the maximum level is 6, which is the article number itself & I have that separately as column MATNR (NODE_ID) in my final dataset. So, I am interested in the rest 5 levels which give me the entire hierarchy in level format. I have used string functions : SUBSTR_BEFORE & SUBSTR_AFTER to derive that, there might be other alternatives to achieve the same too. There is one more thing to notice that I have intentionally left out the last part of the hierarchical path (no substr_after is used to get the last part of the path), this is because the last level for me is the article number & I am not using a separate level field for the same.
Finally it’s time to have a look at the final result set of the whole exercise 🙂
Bingo! Have we not desired the same? 😉
Let’s have a look at the differences among outputs if we change the expression responsible for orphan node handling in the CTE part. For now it is orphan root. So the top most node in the orphan branch is considered as level 1 (ARTHIERLV1).
Let’s change the expression to orphan error & see what happens.
with hier as ( select * from hierarchy ( source ( select "NODEID" as "NODE_ID", "PARENTID" as "PARENT_ID" from "ART_HIER_PAR_CH" ) start where nodeid = 'ARTHIER_01' sibling order by node_id multiparent orphan error cycle breakup ) )
If we change the orphan node handling in the CTE part like above, as expected it raises an error as : SAP DBTech JDBC: : Hierarchy error: The hierarchy source data contains orphans FishingGear -> FishingReels, FishingGear -> FishingReels but the hierarchy definition is configured to reject orphaned nodes.
Similarly, if we change the expression to orphan ignore, we simply don’t get the data relevant to the branch starting with FishingGear any longer in our result set.
For orphan adopt, we get the below result, where the branch starting with FishingGear is succeeded to root node ARTHIER_01 with FishingGear as level 2 node.
Nevertheless, before ending this blog post, I just wanted to add one more paragraph relevant to time dependent hierarchies. The reason I introduced validity range in my dataset is to demonstrate another (time dependent) hierarchy generator function : HIERARCHY_TEMPORAL where we can specify the validity period for the nodes within a hierarchy.
The syntax of the CTE part needs to be changed like below in order to cope with time dependency :
with hier as ( select * from hierarchy_temporal ( source ( select "NODEID" as "NODE_ID", "PARENTID" as "PARENT_ID", "DATEFROM" as "VALID_FROM", "DATETO" as "VALID_UNTIL" from "ART_HIER_PAR_CH" ) start where nodeid = 'ARTHIER_01' sibling order by node_id, valid_from valid from '20110101' until '20991231' multiparent orphan ignore cycle breakup ) )
In the current version of this function, few most important aspects of the syntax are :
- NODE_ID & PARENT_ID columns as mentioned earlier.
- Similar to NODE_ID & PARENT_ID, the function needs 2 more columns with exact names as VALID_FROM & VALID_UNTIL, no other column names are allowed instead. This is the reason I have used aliases in the above script to convert DATEFROM & DATETO to the permissible column names.
- The valid from expression can specify a validity period for the ranges explicitly.
- In current version of the function, we can not have any orphan specification explicitly. The default behaviour is to ignore if any orphan node exists, & this is why, the only supported expression for orphan node handling is orphan ignore (which is used in the script above for demonstration purpose).
Whoof! That’s the conclusion of the loooong blog post with all I had to share about my experience with using the HANA hierarchy functions so far. I truly hope this will help in understanding this efficient feature delivered by SAP a little bit more & I can already see many more complex use cases (fx. building HANA substitutes for segmented data sources for hierarchy info objects in BW/4HANA) can be sorted efficiently with help of these hierarchy functions in HANA.
Furthermore, I would love to know about other community members’ experiences with these functions & possible use cases they have used these in. Also, feel free to let know of any improvement points, secret features, use cases regarding the functions, so we all can cash in from each other 😉
Take care !