Dependency Graph

former_member202764 · ‎08-22-2014

This blog deals with the dependency graph of (an) object(s) required to maintain the list of objects in order of their dependency on this base object.

Creating DB artifacts frequently involves using other DB artifacts e.g.

Procedures may use tables, views, column views, other procedures
Column views use underlying column store tables/other column views/procedures

Thus we, basically, establish an acyclic directed graph of objects and their dependencies. At any point in time, it can be useful to establish a hierarchy of any object in this graph to check its dependents at various levels. While establishing a hierarchy for such an object would mean traversing from this node, through its dependents, to the node with in-degree one, being an acyclic directed graph, there might be more than one hierarchy for this particular object, which means, there might be more than one nodes with in-degree one. In such a case, instead of maintaining a hierarchy, we can choose to maintain an ordered list of objects based on their dependencies, thereby, essentially, breaking down the problem, to ensuring that the child object appears before the parent object, when we traverse from such base object to the "grandest" parent or the top level node or the node(s) with in-degree one.

Why is it necessary

1. Typically, in an environment, such as HANA's, we try to push most of the processing logic into the backend, which means that there would be too many DB objects in the to take care of. In a developer environment with, more than just a few developers, we frequently, have the same DB object(s) being used by multiple developers e.g. a table T being used by two procedures, each belonging to two different developers John and Jim, and then, these procedures further being used to establish a complex graph of dependencies. John, casually, makes changes to the DB object without caring for its repercussions. It might invalidate the entire chain of objects created by the Jim, say, John drops a column from the shared table that is being used by Jim's procedures. Just like the procedures, Jim might have multiple such DB objects, using this changed table, which may have now been rendered invalidated, so much so, that it might not be possible for Jim to remember each of them, by heart; much worse if the change to the table has had a ripple effect across the circuit of dependency, established by this base object table. Now, Jim has two problems:

remembering the list of objects that are using this changed table e.g. Procedure X, Y, Z might be independently using Table T
the order in which these objects have been used e.g. Procedure A might be calling Procedure B might be calling Procedure C which uses the table T; to make things more complex Table T might be being used by Procedure B directly as well

Now, John has a bit if a homework to do to avoid making Jim's life miserable again. Before making such a change, a utility tool like dependency graph could have helped him establish the dependency graph for this Table T. And he could then have taken a call regarding going ahead with that change.

Again, Jim, still, does not have all lost, if he can establish the graph for this object and find out what is/are the object(s) that need to be re-worked on
2. Again, a lot of applications divide their consumption into design time and the runtime. The runtime DB artifacts to be used are created during the activation using the metamodel of the runtime, stored as the metadata. If since last activation, changes to the run time have been proposed in the design time, then the HANA system will not be able to tell you until the proposals are actually activated. However, the metamodel does have a way to tell if a modification/enhancement has been made post activation. So a change log is required to maintain the list of objects that have been changed; as also, the rest of the objects that have been rendered invalidated logically, because of the dependency between them. The added advantage to this is that we do not have to regenerate the entire runtime. We can deal with the affected (direclty/indirectly) objects only

Proposed solution:

Consider the following graph of object dependencies:

Key features from this graph

1. Its an acyclic directed graph

2. Leaf nodes are always tables

3. Non leaf nodes can also be the base objects that trigger the chain of invalidation

Problem

Build the dependency graph for the table P.

Assumption(s):

1. The leaf nodes are tables. So tables are the lower most database artifact that when changed invalidate the graph of objects

2. Cyclic dependencies are not considered since SAP HANA database does not support recursions/cyclic dependencies

3. Dynamic SQL is not taken care of

4. In terms of graph theory, there can at max be 1 base object in a particular path to the top most node. That base object is the first vertex of that path.

Key idea(s)

1. Child object should get activated before its dependent(s) at any level in the graph.

2. We use Breadth First Search to prepare the relationship data between the nodes in the graph. We do not use the Depth First Search as it requires recursions.

Process:

1. Build a parent-child relationship data between the nodes in the graph

Parent-Child List

Node	Parent Node	Level
P	N	1
P	Z	1
N	I	2
N	J	2
Z	V	2
I	F	3
I	J	3
J	Y	3
J	A	3
V	U	3
F	C	4
Y	Z	4
Y	X	4
Y	W	4
A	-	4
U	-	4
C	A	5
Z	V	5
X	V	5
W	V	5
W	U	5

2. Prepare the first list of processed nodes from the above structure. We call the list the List A. We push the base object into the list and mark it Processed. For the rest of the list, we propose the following approach. To decide if a node is processed or not, we look at the Parent Node column in the Parent-Child List prepared in step 1. For each Parent Node in the list above, we find if there are any child objects that have not already been processed, that is, they have not already been included in this list of processed nodes i.e. List A. If there are such nodes then, we include those unprocessed child nodes first and mark them as unprocessed, and then include the current parent node as processed.

Lets see how this goes.

i. we push P into the table and mark it processed. So, List A looks like:

Node	Status (Processed or not)
P	Processed

ii. we take the first Parent Node from Parent-Child List that is N and see if its already included in List A. If not, we look for its children. We find that P is the only child of N. And P has already been included. So we push N in the list and mark it processed. So, List A looks like:

Node	Status (Processed or not)
P	Processed
N	Processed

iii. we take the second Parent Node from Parent-Child List that is Z and see if its already included in List A. If not, we look for its children. We find that P and Y are the children of Z. And P has already been included. But Y has not already been included. So we push Y, first, in the list and mark it Unprocessed. Now, we push Z in the list and mark it processed. So, List A looks like:

Node	Status (Processed or not)
P	Processed
N	Processed
Y	UnProcessed
Z	Processed

.. we continue this approach till we iterate through all the elements in Parent Node column of Parent-Child List. So at the end, this is how List A looks like:

List A

Node	Status (Processed or not)
P	Processed
N	Processed
Y	UnProcessed
Z	Processed
I	Processed
J	Processed
X	UnProcessed
W	UnProcessed
V	Processed
F	Processed
C	UnProcessed
A	Processed
U	Processed

3. Now, from List A, we keep deriving subsequent lists, by checking for only the unprocessed nodes, on the same lines, until all the nodes are processed

List A Version 1

Node	Status (Processed or not)
P	Processed
N	Processed
Y	UnProcessed
Z	Processed
I	Processed
J	Processed
X	UnProcessed
W	UnProcessed
V	Processed
F	Processed
C	UnProcessed
A	Processed
U	Processed

List A Version 2

Node	Status (Processed or not)
P	Processed
N	Processed
J	UnProcessed
Y	Processed
Z	Processed
I	Processed
X	Processed
W	Processed
V	Processed
F	Processed
C	Processed
A	Processed
U	Processed

List A Version 3

Node	Status (Processed or not)
P	Processed
N	Processed
J	Processed
Y	Processed
Z	Processed
I	Processed
X	Processed
W	Processed
V	Processed
F	Processed
C	Processed
A	Processed
U	Processed

Limitations:

Dynamic SQL is not covered- As is natural, the dependencies of parent objects on the objects used in dynamic SQL queries is not covered in the metadata views of the SYS schema and therefore, in order to capture the such dependencies, methods like parsing need to be resorted to; this solution however, leaves behind the question of the very premise of the dynamic SQL
Deletion of base objects- The base objects if deleted do not have a presence in the metadata views of HANA which makes the detection of the invalidated objects impossible. We might need to have a custom table to address such a scenario
The dependencies between the attribute views and analytic/calculation views are not maintained in the HANA system. So one needs to separately maintain such dependencies

Support from HANA platform:

1. The HANA DB provides system views like PROCEDURES, VIEWS, OBJECT_DEPENDENCIES which can be used to build the dependencies and check the valid/invalid status of the chain

Technical Implementation

The technical implementation of this involves talking with the metadata views of the SYS schema. The views involved are:

Code Piece



set schema chlog;

drop table relation;

create global temporary table relation(node nvarchar(32), parent_node nvarchar(32), level int, object_no int, is_processed nvarchar(32));

drop table changelog_tab;

create column table changelog_tab(object nvarchar(32));

drop table gt_chlog1;

drop table gt_chlog2;

create  global temporary table gt_chlog1(node nvarchar(32),is_processed int,order_no int);

create  global temporary table gt_chlog2(node nvarchar(32),is_processed int,order_no int);

insert into changelog_tab values('P');

delete from relation;

drop table chlog.attr_view_relation;

create table chlog.attr_view_relation(an_view nvarhar(256),at_view nvarchar(256));

insert into chlog.attr_view_relation values('chlog/N','chlog/O');

insert into chlog.attr_view_relation values('chlog/E','chlog/G');

drop view object_dependency;

create view object_dependency as select * from object_dependencies where dependency_type=1 and base_object_type in ('PROCEDURE','VIEW','TABLE') and dependent_object_type in ('PROCEDURE','VIEW','TABLE') and dependent_object_name not like '%/hier/%'

union

select '_SYS_BIC',right_table,'VIEW','_SYS_BIC',entity_name,'VIEW',1 from chlog.attr_view_relation ;

drop procedure check_log;

create PROCEDURE check_log ( )

  LANGUAGE SQLSCRIPT

  SQL SECURITY INVOKER

  DEFAULT SCHEMA chlog

-- READS SQL DATA

AS

/*****************************

  Write your procedure logic

*****************************/

i int;

lv_object nvarchar(32);

lv_base_object_name nvarchar(32);

flag int;

lv_no_of_unprocessed int;

lv_objects_left_at_level int;

arr_parent nvarchar(32) array;

arr_node nvarchar(32) array;

arr_status int array;

arr_order_no int array;

lv_cnt int;

lv_cnt1 int;

current_node nvarchar(32);

current_order_no int;

current_status int;

cnt_unprocessed int;

lv_max_order_no int;

lv_max_at_level int:=0;

lv_is_exist int := 0;

lv_any_more int :=0;

begin

DECLARE EXIT HANDLER FOR SQLEXCEPTION SELECT ::SQL_ERROR_CODE, ::SQL_ERROR_MESSAGE FROM DUMMY;

select top 1 object into lv_object from changelog_tab;

i:=0;

lv_base_object_name:=lv_object;

flag:= 1;

truncate table relation;

insert into relation select base_object_name node, replace(dependent_object_name,'/proc','') parent_node, i+1 level,:lv_max_at_level + row_number() over () object_no ,null is_processed  from chlog.object_dependency where base_object_type in ('PROCEDURE','VIEW','TABLE') and dependent_object_type in ('PROCEDURE','VIEW','TABLE') and dependent_object_name not like '%/hier/%' and base_object_name in (select object from changelog_tab) and dependency_Type = 1;

i:=1;

while flag =1 do

--get maximum object_no for the current level

  select case when max(object_no) is null then 0 else max(object_no) end into lv_max_at_level from relation where level= i+1;

  lv_is_exist:=0;

--get 1st level dependents

  select count(*) into lv_is_exist  from chlog.object_dependency where base_object_type in ('PROCEDURE','VIEW','TABLE') and dependent_object_type in ('PROCEDURE','VIEW','TABLE') and dependent_object_name not like '%/hier/%' and base_object_name=lv_base_object_name and dependency_Type = 1;

  if lv_is_exist = 0 then

--if no such dependents exist then this is the root of the hierarchy, push a dummy record for this base object with null as the parent object

  insert into relation values(lv_base_object_name, null, i+1 ,:lv_max_at_level + 1,'X');

  else

--if dependents exist enter the first level dependents

  insert into relation select base_object_name node, replace(dependent_object_name,'/proc','') parent_node, i+1 level,:lv_max_at_level + row_number() over () object_no ,null is_processed  from chlog.object_dependency where base_object_type in ('PROCEDURE','VIEW','TABLE') and dependent_object_type in ('PROCEDURE','VIEW','TABLE') and dependent_object_name not like '%/hier/%' and base_object_name=lv_base_object_name and dependency_Type = 1;

  end if;

--mark this node present as parent node in other rows of the current snapshot processed

  update relation set is_processed= 'X' where parent_node = :lv_base_object_name;

--routine run to mark those parent nodes that have been newly added after their older instances have been already processed and had therefore been marked processed for that snapshot

  lt_parent =select parent_node from relation where is_processed='X';

  update relation set is_processed='X' where parent_node in (select parent_node from :lt_parent);

--check for any objects left unprocessed at current level

  select count(*) into lv_objects_left_at_level from relation where  parent_node is not null and level= :i and is_processed is null;

  if lv_objects_left_at_level > 0 then

--if left, then select the next unprocessed object at the current level

  select top 1 parent_node into lv_base_object_name from relation where level= :i and is_processed is null

  and parent_node not in (select distinct node from relation a where a.node is not null) and parent_node is not null  order by object_no;

  flag:=1;

  else

--if no, then check if this is the last level

-- if this is the last level then exit

-- if this is not the last level then increase the level counter i and start processing this new level in the next while loop iteration

  i:=:i+1;

  select count(*) into lv_any_more from relation where parent_node not in (select node from relation a where a.node is not null) and parent_node is not null;

  if lv_any_more >0 then--for last level

  select top 1 parent_node into lv_base_object_name from relation where level= :i and is_processed is null

  and parent_node not in (select distinct node from relation a where a.node is not null) and parent_node is not null  order by object_no;

  flag := 1;

  else

  flag:= 0;

  end if;

  end if;

end while;

select * from relation;

truncate table gt_chlog1;

truncate table gt_chlog2;

--preparing the 1st list of processed/unprocessed nodes (objects)

select count(*) into lv_cnt from relation where parent_node is not null;

lt_mid= select * from relation where parent_node is not null order by level,object_no;

arr_parent:=array_agg(:lt_mid.parent_node);

insert into gt_chlog1 select object,1,row_number() over () from changelog_tab;

--insert into gt_chlog1 values(:lv_object,1,1);

for i in 1..:lv_cnt do

  current_node:= :arr_parent[:i];

--check if the current node is already mentioned in the list processed/unprocessed, if yes proceed to the next node (iteration)

  select count(*) into lv_cnt1 from gt_chlog1 where node=:current_node;

  if :lv_cnt1 =1 then

  continue;

  end if;

  select max(order_no) into lv_max_order_no from gt_chlog1;

  insert into gt_chlog1 select node,0 is_processed,lv_max_order_no + row_number() over () order_no from relation where parent_node=:current_node and node not in (select node from gt_chlog1);

  select max(order_no) into lv_max_order_no from gt_chlog1;

  insert into gt_chlog1 values(current_node,1,lv_max_order_no + 1 );

end for;

select * from gt_chlog1;

--keep iterating till all the nodes are processed and are in the order where child comes before the parent

select count(*) into cnt_unprocessed from gt_chlog1 where is_processed= 0;

select count(*) into lv_cnt from gt_chlog1;

while cnt_unprocessed != 0 do

  for i in 1..:lv_cnt do

  select node,is_processed,order_no into current_node,current_status,current_order_no from gt_chlog1 where order_no = :i;

  select count(*) into lv_is_exist from gt_chlog2 where node=current_node;

  if lv_is_exist = 0 then

  select case when max(order_no) is null then 0 else max(order_no) end into lv_max_order_no from gt_chlog2;

  if :current_status = 1 then

  insert into gt_chlog2 values(:current_node,:current_status,lv_max_order_no+1);

  else

  insert into gt_chlog2 select node,0 is_processed,lv_max_order_no + row_number() over () order_no from relation where parent_node=:current_node and node not in (select node from gt_chlog2);

  select max(order_no) into lv_max_order_no from gt_chlog2;

  insert into gt_chlog2 values(:current_node,1,lv_max_order_no+1);

  end if;

  end if;

  end for;

  select count(*) into cnt_unprocessed from gt_chlog2 where is_processed= 0;

  select * from gt_chlog2;

  if cnt_unprocessed !=0 then

  truncate table gt_chlog1;

  insert into gt_chlog1 select * from gt_chlog2;

  truncate table gt_chlog2;

  end if;

end while;

END;

call chlog.check_log;

Output:

We can have multiple objects pushed into changelog_tab and the procedure will give us the correct order in which the objects depend on each other. Try the following input and run the procedure:



insert into changelog_tab values('P');

insert into changelog_tab values('T');

insert into changelog_tab values('K');

insert into changelog_tab values('D');

call chlog.check_log;

Output:

Thank You.

-Sheel Pancholi

Dependency Graph

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win