Skip to Content
Technical Articles

HANA Data Lake Size and Row Monitoring

Are you looking for a programatic way to monitor the size of your data lake objects?  How about viewing the row counts of the objects as well?

As you will have seen from my previous blog posts about the SAP HANA Data Lake, it is based on SAP IQ (formerly Sybase IQ) technology.  Some of the monitoring that we would have done in years past in an SAP IQ landscape can be ported to work with the SAP HANA Data Lake.

The basic premise is simple: develop code that runs inside the data lake to capture the proper statistics and then expose that code via a stored procedure.  There are (or will be when the next evolution of the data is released in Q1 2021 per the roadmap), two ways to interface with the data lake.  Since Q1 2020, the data lake has been exposed through SAP HANA Cloud.  In the next generation of the data lake we will be exposing a native SQL endpoint or interface.  This blog will cover both interfaces.

First, the code.  As mentioned, we must develop code which captures the proper statistics.  In this case, we want to capture the size of each table as well as the number of rows in the table.  This can be done via a SQL stored procedure that is created in the data lake.

If you will be exposing the data lake via SAP HANA Cloud, this code should be used.  It puts the proper syntax and wrappers around the code so that HANA can send it to the data lake for creation.

CALL SYSRDL#CG.REMOTE_EXECUTE('
drop procedure if exists sp_iqsizes;
create procedure sp_iqsizes()
begin
	declare local temporary table size_res (
		table_owner varchar(128)
		, table_name varchar(128)
		, sizeKB bigint
		, rowcount bigint
		) in SYSTEM;
	declare sizeKB bigint;
	declare rc bigint;
	declare o varchar(128);
	declare t varchar(128);
	declare ot varchar(255);
	declare blksz int;

	-- SYSIQINFO not exposed in HDL yet, but HDL uses 32KB blocks
	-- can revert to commented out line in next gen HDL
	--select first block_size/512/2 into blksz from sys.SYSIQINFO;
	set blksz=32;

	FOR FORLOOP as FORCRSR CURSOR
        FOR select table_name, user_name table_owner
            from sys.systable, sys.sysuser
            where creator <> 3 and server_type = ''IQ''
	    and table_type <> ''PARTITION'' and creator=user_id
	do
		set sizeKB=0;
		set rc=0;
		set o=table_owner;
		set t=table_name;
	        set ot=o+''.''+t;

	        select convert(bigint, NBlocks * blksz) into sizeKB
			from dbo.sp_iqtablesize(ot);
	        execute ( ''select count(*) into rc from ''||ot );
	        insert into size_res
	 	 	 values ( table_owner, table_name, sizeKB, rc );
	end for;

	select * from size_res;
end;

drop view if exists V_HDL_ALL_SIZES;
create view V_HDL_ALL_SIZES as select * from sp_iqsizes();

');

 

If you will be using the SQL Endpoint for the HANA Data Lake, then this code would be used.  The only difference is the REMOTE_EXECUTE wrapper and the single quote escaping that is done by using two single quotes.

drop procedure if exists sp_iqsizes;

create procedure sp_iqsizes()
begin
	declare local temporary table size_res (
		table_owner varchar(128)
		, table_name varchar(128)
		, sizeKB bigint
		, rowcount bigint
		) in SYSTEM;
	declare sizeKB bigint;
	declare rc bigint;
	declare o varchar(128);
	declare t varchar(128);
	declare ot varchar(255);
	declare blksz int;

	-- SYSIQINFO not exposed in HDL yet, but HDL uses 32KB blocks
	-- can revert to commented out line in next gen HDL
	--select first block_size/512/2 into blksz from sys.SYSIQINFO;
	set blksz=32;

	FOR FORLOOP as FORCRSR CURSOR
        FOR select table_name, user_name table_owner
            from sys.systable, sys.sysuser
            where creator <> 3 and server_type = 'IQ'
	    and table_type <> 'PARTITION' and creator=user_id
	do
		set sizeKB=0;
		set rc=0;
		set o=table_owner;
		set t=table_name;
	        set ot=o+'.'+t;

	        select convert(bigint, NBlocks * blksz) into sizeKB
			from dbo.sp_iqtablesize(ot);
	        execute ( 'select count(*) into rc from '||ot );
	        insert into size_res
	 	 	 values ( table_owner, table_name, sizeKB, rc );
	end for;

	select * from size_res;
end;

drop view if exists V_HDL_ALL_SIZES;

create view V_HDL_ALL_SIZES as select * from sp_iqsizes();

 

Regardless of which code path you have used, the net is that the data lake will now have a procedure called sp_iqsizes() and view called V_HDL_ALL_SIZES.  When run, sp_iqsizes takes no parameters and will collect the size in KB/kilobytes and the row count for each table in the data lake that is not a system table.

The view is needed to expose the code logic (stored procedure) to a remote system, SAP HANA Cloud.

If you wish to expose the data lake procedure and view to SAP HANA Cloud, this code can be used to do just that:

drop table HDL_ALL_SIZES;

CREATE VIRTUAL TABLE "HDL_ALL_SIZES"
    AT "SYSRDL#CG_SOURCE"."<NULL>"."SYSRDL#CG"."V_HDL_ALL_SIZES";

At this point, we have created the necessary data lake objects as well as mapped those through HANA Cloud for programmatic use in HANA Cloud.  To access the data from within HANA Cloud, you can run this command:

select * from HDL_ALL_SIZES;

I ran this via the HANA Cloud Database Explorer and have this as output in the test system:

HANA%20Cloud%20DBExplorer%20Output

HANA Cloud DBExplorer Output

Hope this helps you in better monitoring the space consumption of your objects, programmatically, in your data lake deployment.  As an added benefit, if you are an existing SAP IQ user, the stored procedure will also work in an on-premise deployment of SAP IQ 16.

/
HANA%20Cloud%20DBExplorer%20Output
Be the first to leave a comment
You must be Logged on to comment or reply to a post.