Skip to Content
Author's profile photo Krishna Tangudu

SAP HANA: Generic Procedure using Arrays to Load delta data into Tables

Hello Folks,

Problem Description:

To load data into SAP HANA using a generic Stored Procedure and handling delta.I can see multiple threads to load the data into tables in HANA and to avoid usage of cursors. Hence writing my inputs on the solution of this problem using “Arrays”.

Note: This blog is intended to address the above mentioned problem. I will be updating this document with additional work arounds on “real-time” issues in coming up weeks.

Scenarios:

Delta Type = FULL:

You might want to do a “Full” update . Means you want to delete the data from the target table and reload the entire data from the source.

Delta Type = DELTA:

You might want to load “Delta” depending on the primary keys of the table i.e Update if the record exists else insert the new record.

Also have a look on this blog which gives a overview on how this procedure can be called from HDBSQL

SAP HANA: Replicating Data into SAP HANA using HDBSQL

Here is the code:


CREATE PROCEDURE DATA_LOAD_USING_ARRAY(IN IN_SOURCE_SCHEMA VARCHAR(10) ,
     IN IN_SOURCE_TABLE VARCHAR(100) ,
     IN IN_TARGET_SCHEMA VARCHAR(10) ,
     IN IN_TARGET_TABLE VARCHAR(100),
     IN IN_DELTA_TYPE VARCHAR(10))
LANGUAGE SQLSCRIPT SQL SECURITY INVOKER AS   i INT;
BEGIN
   DECLARE STRINGARRAY  VARCHAR(20) ARRAY;
   DECLARE TEXT VARCHAR(100000);
   DECLARE rowcount integer;
   DECLARE primary_keys VARCHAR(5000) := '' ;
   DECLARE column_list VARCHAR(30000) := '' ;
   DECLARE DELETE_QUERY VARCHAR(30000) := '' ;
   DECLARE UPSERT_QUERY VARCHAR(30000) := '' ;
   DECLARE INSERT_QUERY VARCHAR(30000) := '' ;
    /* GETTING PRIMARY KEY COLUMNS OF THE TARGET TABLE*/
   pk_tab = SELECT column_name FROM CONSTRAINTS
WHERE table_name = UPPER(:IN_TARGET_TABLE)
AND schema_name = UPPER(:IN_TARGET_SCHEMA) AND is_primary_key = 'TRUE' ;
   STRINGARRAY := ARRAY_AGG(:pk_tab.column_name);
   SELECT count(*) into rowcount from :pk_tab; -- To know the count of columns
   TEXT := :STRINGARRAY[1];
   -- To form the primary key list
   FOR i IN 1 ..:rowcount-1
   DO
   TEXT := :TEXT || ',' || :STRINGARRAY[:i+1];
   END FOR;
   primary_keys := :TEXT;
  -- To handle the column names with "Spaces" in between
   SELECT '"' ||  REPLACE (:primary_keys,',','","') || '"' INTO primary_keys FROM DUMMY;
   /* GETTING COLUMNS OF THE TARGET TABLE*/
    col_tab = SELECT column_name FROM TABLE_COLUMNS
  WHERE table_name = UPPER(:IN_TARGET_TABLE)
  AND schema_name = UPPER(:IN_TARGET_SCHEMA) ;
   STRINGARRAY := ARRAY_AGG(:col_tab.column_name);
   SELECT count(*) into rowcount from :col_tab;
   TEXT := :STRINGARRAY[1];
    -- To form the column-list
   FOR i IN 1 ..:rowcount-1
   DO
   TEXT := :TEXT || ',' || :STRINGARRAY[:i+1];
   END FOR;
   column_list := :TEXT;
 -- To handle the column names with "Spaces" in between
   SELECT '"' ||  REPLACE (:column_list,',','","') || '"' INTO column_list FROM DUMMY;
   IF IN_DELTA_TYPE = 'FULL'
   THEN
   --To delete the data from the target table
   DELETE_QUERY :=
  'DELETE FROM "'||IN_TARGET_SCHEMA||'"."'||IN_TARGET_TABLE||'"' ;
   -- To insert the new load into target table
   INSERT_QUERY :=
  'INSERT INTO "'||IN_TARGET_SCHEMA||'"."'||IN_TARGET_TABLE||'" ('||:column_list||')
   SELECT '||:column_list||' FROM "'||IN_SOURCE_SCHEMA||'"."'||IN_SOURCE_TABLE||'"';
   EXEC (:DELETE_QUERY);
   EXEC (:INSERT_QUERY);
   ELSEIF IN_DELTA_TYPE = 'DELTA'
   THEN
   UPSERT_QUERY :=
  'UPSERT '||IN_TARGET_SCHEMA||'.'||IN_TARGET_TABLE||' SELECT '||:column_list||'
  FROM '||IN_SOURCE_SCHEMA||'.'||IN_SOURCE_TABLE||'';
   EXEC (:UPSERT_QUERY);
   END IF;
END;







Sample Call Statement:


CALL DATA_LOAD_USING_ARRAY ('KRISHNA','EMPLOYEE','SYSTEM','EMPLOYEE','FULL');





Error handling and maintaining data consistency:

Please find the details in my other blog on how can you achieve consistency along with error handling:

SAP HANA: Workaround for using Commit & Rollback in exception handling block of a Stored Procedure

For Feedback on whether the procedure is success or failure and on how to control the “Commits” you can the relevant information in the above mentioned blog.

Hence i didn’t add those details into this document to make it more specific to the title. ( if required based on the comments will add that too )

As mentioned at the start i will be updating the above code whenever possible.


In the mean time i would love to have your comments and if there is anyother better way of doing it.

Yours,

Krishna Tangudu

Assigned Tags

      24 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Phani KV
      Phani KV

      Hi Krishna,

      Very good document about full and delta.

      Thanks,

      Phani.

      Author's profile photo Krishna Tangudu
      Krishna Tangudu
      Blog Post Author

      Thanks Pani 🙂

      Regards,

      Krishna Tangudu

      Author's profile photo Former Member
      Former Member

      Hi Krishna,

      Wonderful blog. But i have some questions here,

      Why did you use ARRAY_AGG and UNNEST function to get the count of columns in line 20 and 21? You could have directly used,

      SELECT count(*) into rowcount from :pk_tab right?

      Thanks,

      Akshay

      Author's profile photo Krishna Tangudu
      Krishna Tangudu
      Blog Post Author

      Thanks Akshay for your feedback.

      Yes you are correct ( Redundant logic there ) have corrected the code now.

      Regards,

      Krishna Tangudu

      Author's profile photo Former Member
      Former Member

      Hi Krishna,

      One more question to ask here. I suppose we can also implement this by using cursors instead of arrays. Which one do you think will be optimal ?

      Thanks,

      Akshay

      Author's profile photo Krishna Tangudu
      Krishna Tangudu
      Blog Post Author

      Hi Akshay,

      Hmmm that is one question which i have in my mind. The focus of this document initially was to avoid cursors. Which was achieved using Arrays as mentioned above and even table variables.

      Now which is the best option (w.r.t performance ) and what has to be followed is what even am looking out for (hoping that some one would comment here with their experience )

      Regards,

      Krishna Tangudu

      Author's profile photo Vivek Singh Bhoj
      Vivek Singh Bhoj

      Nice document Krishna 🙂

      Regards,

      Vivek

      Author's profile photo Krishna Tangudu
      Krishna Tangudu
      Blog Post Author

      Thanks Vivek 🙂

      Regards,

      Krishna Tangudu

      Author's profile photo Former Member
      Former Member

      Nice document Krishna... Keep writing such a nice notes... 🙂

      Author's profile photo Krishna Tangudu
      Krishna Tangudu
      Blog Post Author

      Thanks for your kind words KD

      Will give my best attempts to write 🙂

      Regards,

      Krishna Tangudu

      Author's profile photo Raj Kumar Salla
      Raj Kumar Salla

      Krishna,

      Not got the opportunity to test your code. I would like to know how will the performance for table having bulk data. Any add-on on this will be really useful. Keep up the good work.

      Regards

      Raj

      Author's profile photo Krishna Tangudu
      Krishna Tangudu
      Blog Post Author

      Hi Raj,

      Can you tell me on what volumes are speaking of to test for?  So that i will test accordingly.

      Have tested it with records upto 70 million records for one table.

      Regards,

      Krishna Tangudu

      Author's profile photo Raj Kumar Salla
      Raj Kumar Salla

      What was the time it took for 70m records?

      Regards

      Raj

      Author's profile photo Krishna Tangudu
      Krishna Tangudu
      Blog Post Author

      Total framwork time as i mentioned in the another blog SAP HANA: Replicating Data into SAP HANA using HDBSQL

      takes about 9- 10 minutes to complete the load of that table . I didnt take the HANA side timings in specific will take them and will let you know.

      Regards,

      Krishna Tangudu

      Author's profile photo Former Member
      Former Member

      Hi Krishna ,

         I am getting error at line Number 22. The Error is Feature is not supported in HANA.

         While debugging I have seen that there is nothing in STRINGARRAY.

      Regards,

      Hartosh

      Author's profile photo Krishna Tangudu
      Krishna Tangudu
      Blog Post Author

      Hi Hartosh,

      Do comment your call statement

      Regards,

      Krishna Tangudu

      Author's profile photo miyan aleem
      miyan aleem

      Hi  krishna,

      I am  having one problem, In my use  case I have to write 3 conditioned query in procedure based on session_user. whenever I write the query in procedure and call in xsodata file I am getting the error like Invalid procedure or parameter list.

      Thanks

      Author's profile photo Former Member
      Former Member

      CREATE PROCEDURE "DSS".sortMergeJoin_Arrays() LANGUAGE SQLSCRIPT AS

      BEGIN

            declare rstart bigint;

            declare rstop bigint;

            declare lstart bigint;

            declare lstop bigint;

            declare STARTS_ARR BIGINT ARRAY;

            declare ENDS_ARR BIGINT ARRAY;

            declare START_ARR BIGINT ARRAY;

            declare STOP_ARR BIGINT ARRAY;

            declare rlen bigint;

            declare llen bigint;

            declare li bigint;

            declare ri bigint;

           

          

            rightTable = SELECT "STARTS" FROM "DSS".REPEATMASK2;

            leftTable  = SELECT to_bigint("START") as "START" FROM "DSS".CLINVAR2;

           

            STARTS_ARR := ARRAY_AGG(:rightTable.STARTS);

            START_ARR := ARRAY_AGG(:leftTable."START");

            select count("STARTS") into rlen from :rightTable;

            select count("START") into llen from :leftTable;

               res = select '' as lstart , '' as lstop from dummy where 1<>1;

               li := 0;

               ri := 0;

      LOOP

      if(:ri > :rlen ) OR ( :li > :llen )  then

      break;

      end if;

      lstart := :START_ARR[:li];

      rstart := :STARTS_ARR[:ri];

      if (:rstart <= :lstart) and (:rstop >= :lstart)

      then

      res = (select * from :res)  union (select :lstart as lstart , :lstop as lstop from dummy);

      li := :li + 1;

      continue;

      end if;

      if (:rstart > :lstart)

      then

      li := :li + 1;

      continue;

      end if;

      if (:rstop < :lstart)

      then

      ri := :ri + 1;

      end if;

       

            

              

      end LOOP;

      select * from :res;

      END;

      Author's profile photo Former Member
      Former Member

        Hi Hartosh,

      I worked with your problem, I think Mistake which you was doing is that you are giving index to 0 in below statement

      rstart := :STARTS_ARR[:ri];

      Where as index start from 1 to 2^31.


      if there are chances that table does not contain data its better to handle that situation in try and catch.


      try changing index it will work.

      Hope this will help you



      Thanks

      Bhupender

      Author's profile photo Former Member
      Former Member

      call "DSS".sortMergeJoin_Arrays()

      Author's profile photo Former Member
      Former Member

      When I execute this call statement from SQL Script Console inside HANA Studio i get error Arithmetic exception at line lstart := :START_ARR[:li];  I think this is because START_ARR is empty or can not be accessed like this. i have seen there is no value in START_ARR in debugging mode.

      Author's profile photo Former Member
      Former Member

      Hi Krishna,

         good effort.

         I have a doubt here.

         What is the use of  "primary_keys" variable in your logic ?

          You maintained this variable but used nowhere in the rest of the code

      Thanks and Regards

      Nagababu

      Author's profile photo Former Member
      Former Member

      Nice post Krishna! Thank you.

      Best Regards

      Venkat...

      Author's profile photo Former Member
      Former Member

      Hi Krishna ,

      Iam facing a typical performance related issue with using a cursor in a stored procedure .

      Well the business logic is such that it warrants the use of a curosor .

      I will try to explain the best i can of the issue and hoping that you can help me with some pointers .

      Business requirement is to :

      1) Compare table A which has expenses

      2) Compare table B which has sales

      3) Compare expense and sales belonging to similair batch and then generated few expenses

      -----------

      The logic is implemented as follows :

      Cursor on expenses

           cursor on sales

              process the expense + sales combination to generate additional expense records

              then insert data into a result table

          close cursor sales

      close cursor  expenses .

      -----------

      The above process is taking lot of time in the envionment to process even a set of 2k records .

      Any pointer on how to make the code work faster .

      Avoiding usage of cursors seems to be unavoidable at this point .

      Any pointers will help ?? .