How to avoid NULL (?) values when using dimension attributes with fact table columns that contain blanks ( ” )
Purpose: Describe a method to address a common DW/BI problem of not having a matching row in a dimension for a given fact where the fact column is blank (” ) whitespace. In general, we want to avoid returning null attribute values for a given entry in a fact. Just as a side note – this issue is not specific to HANA and can (and does) need to be addressed in whatever database your solution may be implemented in. This simply describes an easy way to solve using HANA based components.
Interesting to note, in SAP BW, master data tables/InfoObjects always have an entry with a blank row. If I had to guess, this is to ensure this behavior does not occur!
Real World Problem: User observes that while querying a given model in HANA, there is $300 in total sales by “attribute 1”, with $100 of that falling into a null (“?”) “attribute 1”. When the user now implements a filter on “attribute 1”, the null value is dropped and the $100 “disappears”, which can cause some heartburn for the average user, “where did my sales go?!” The model is built with a fact and dimension, having a left outer join with a 1:n relationship.
Shouldn’t this join type never drop data in the fact? The answer is no, even in a left outer join, when you apply a filter on the right table an inner join is effectively executed. If you have a fact table that has blank ( ” ) values in it, and no matching blank ( ” ) value in the dimension, then the inner join drops the key figure value from the result.
Generally, any ECC tables replicated with SLT will have tables with columns that are non-nullable with default values of 0 (for decimal) or ( ” ) for NVARCHAR, therefore if there is no value in a given column it will always be stored internally as blank ( ” ) or whitespace.
Observing the “problem”
CREATE TABLE "MOLJUS02"."TEST_WHITESPACE_FACT" ("MATNR" NVARCHAR(18), "SALES" DECIMAL(15,2)); CREATE TABLE "MOLJUS02"."TEST_WHITESPACE_DIM" ("MATNR" NVARCHAR(18), "ATTR1" NVARCHAR(18)); INSERT INTO "MOLJUS02"."TEST_WHITESPACE_FACT" VALUES ('1234567', 100); INSERT INTO "MOLJUS02"."TEST_WHITESPACE_FACT" VALUES ('1238890', 100); INSERT INTO "MOLJUS02"."TEST_WHITESPACE_FACT" VALUES ('', 100); INSERT INTO "MOLJUS02"."TEST_WHITESPACE_DIM" VALUES ('1234567', 'BLUE'); INSERT INTO "MOLJUS02"."TEST_WHITESPACE_DIM" VALUES ('1238890', 'RED'); --Simulate a dim/fact query SELECT B."ATTR1", SUM(A."SALES") FROM "MOLJUS02"."TEST_WHITESPACE_FACT" A LEFT OUTER JOIN "MOLJUS02"."TEST_WHITESPACE_DIM" B ON (A."MATNR" = B."MATNR") GROUP BY B."ATTR1" --Simulate a dim/fact query SELECT B."ATTR1", SUM(A."SALES") FROM "MOLJUS02"."TEST_WHITESPACE_FACT" A LEFT OUTER JOIN "MOLJUS02"."TEST_WHITESPACE_DIM" B ON (A."MATNR" = B."MATNR") WHERE B."ATTR1" <> 'BLUE' GROUP BY B."ATTR1"
Result from first query
Result from second query – Where did my $100 go?! If I excluded Blue materials, I should still have $200 remaining!
Real World Solution: It’s actually quite simple – you always want to have a matching record in your dimension for any possibilities that may exist in the fact. In some applications including SAP ERP, you can have transactions that have no value for a given column. In my example above, the real scenario occurred with CO-PA when there were transactions with Sales values that had no MATNR (Material) assigned, only a ( ” ), so it is certainly possible that this can happen in a production environment.
To solve the above, we simply need a record in the dimension with a ( ” ) value in the key(s). This would avoid any chance of null values occurring when using an attribute.
INSERT INTO "MOLJUS02"."TEST_WHITESPACE_DIM" VALUES ('', '');
Now, lets run the same queries above again and observe the difference.
( ” ) instead of null is shown
No longer dropping any sales since we now have a matching dimension, my data is back!
Now, the above is all good and fine – the solution is straightforward you are saying. I can just insert a record into the dimension tables required in each environment required, right? My answer would be, that’s no fun – why would you manually do this when you can have HANA perform this itself? Let’s create a stored procedure that does this work for us! 🙂
We need to create a stored procedure that inserts a whitespace row into every table in a target schema. In my case, this was an SLT managed schema, so some elevated rights are needed. To insert into an SLT managed schema, you either need to havethe role <SCHEMA_NAME>_POWER_USER (which is created when an SLT connection is made) and create the procedure as “Run With” Invokers Rights OR assign the same role to user SYS_REPO and choose “Definer’s Rights” for the procedure.
– Procedure has input parameter to change which target schema to insert whitespace records into
– Read all tables of target schema that are not SLT or DD* tables, and also have non-nullable columns
– Omit any tables that have at least one non-nullable column, or if it has a nullable column and no default value also omit it.
– Only look at tables where the column in position 1 is a primary key
– Build a worklist of table/key to loop at
– Build a dynamic SQL statement to insert a whitespace record into each table in the schema.
Below is the source from SYS_BIC representation to show the parameter.
create procedure "_SYS_BIC"."sandbox.justin.insert_whitespace/INSERT_WHITESPACE_2" ( in process_schema NVARCHAR(12) ) language SQLSCRIPT sql security invoker default schema "DE2_DH2" as /********* Begin Procedure Script ************/ -- scalar variables i INTEGER; row_count INTEGER; statement NVARCHAR(256); BEGIN --Work list table_list = --List of tables in the target schema that match criteria --perform minus to remove any tables that have at least one --nullable field or a nullable field plus no default value SELECT DISTINCT "TABLE_NAME" FROM "SYS"."TABLE_COLUMNS" WHERE "SCHEMA_NAME" = :process_schema AND "IS_NULLABLE" <> 'TRUE' AND "DEFAULT_VALUE" is not null AND "TABLE_NAME" NOT LIKE 'RS%' AND "TABLE_NAME" NOT LIKE 'DD%' MINUS --List of tables that have at least one nullable column, or one --non nullable column that has no default value SELECT DISTINCT "TABLE_NAME" FROM "SYS"."TABLE_COLUMNS" WHERE "SCHEMA_NAME" = :process_schema AND ("IS_NULLABLE" = 'TRUE' OR ("IS_NULLABLE" = 'FALSE' AND "DEFAULT_VALUE" is null)) AND "TABLE_NAME" NOT LIKE 'RS%' AND "TABLE_NAME" NOT LIKE 'DD%'; --Use previous table list and find the column that is a primary key --and is in position 1 for each table, in order to build insert statement table_columns = SELECT A."TABLE_NAME", B."COLUMN_NAME" FROM :table_list A INNER JOIN "SYS"."INDEX_COLUMNS" B ON (A."TABLE_NAME" = B."TABLE_NAME") WHERE B."POSITION" = '1' AND B."CONSTRAINT" = 'PRIMARY KEY'; -- store dataset size in row_count variable SELECT COUNT(*) INTO row_count FROM :table_columns; --Loop over result set, building and executing an INSERT statement for the required tables FOR i IN 0 ..:row_count-1 DO SELECT 'INSERT INTO' || ' "' || :process_schema || '"."' || "TABLE_NAME" || '" ' || '("' || "COLUMN_NAME" || '") VALUES ('''')' INTO statement FROM :table_columns LIMIT 1 OFFSET :i; EXEC(:statement); END FOR; END; /********* End Procedure Script ************/