Hi folks,
In my recent post, Thinking in HANA - Part 2: Revisiting SCDs, I made reference to a current challenge that we're faced with regarding slowly challenging dimensions. Our solution goes a bit beyond the solution I referenced - populating dimension and fact tables with surrogate keys - so I thought I would share an interesting solution to what I presume is a relatively widespread problem.
Problem Statement
Solution
Solution steps 1 and 3 are above have already been discussed in my previous post. Solution step 2 is the focus of this article. Note that if tables were not consolidated/conformed into a single target table, 8 different Attribute Views would have to be created for each SCD table, and these 8 views would have to modeled 3 times against each employee in the fact data set, leading to 24 modeled views. That requires significantly more effort and maintenance (and room for error) than modeling the same Attribute View three times. (Note that Attribute Views in these cases should be created as "Derived" Attribute Views based on the first view so that all changes to the view itself get propagated - without needing to maintain 3 distinct views. The same Attribute View cannot be modeled itself more than once).
In order to demonstrate the solution, examine the following data set (based on my previous posts). It shows two different time-dependent dimension tables.
This data is illustrated graphically below:
The desired target table and graphical representation are here below:
The solution will be provided in the following SQL with comments that highlight logic. Please copy, inspect, and run the SQL to understand the solution.
Before posting the solution, it's worth noting some of the implications. The logic "explodes" time-dependent data to day-level granularity. In our case, this currently involves exploding out one of the tables of 200,000 distinct records across 600 days which results in a formidable intermediate result set of 120 million records. In combination with additional logic and tables we've run into some out-of-memory errors, so we'll likely be implementing something along the lines of an iterative solution that runs on bite-size chunks of data, one at a time. For an example of such an approach, check out this article by justin.molenaur2.
DROP TABLE DIM_MATERIAL;
CREATE COLUMN TABLE DIM_MATERIAL
(
PROD_ID CHAR(1),
DATE_FROM DATE,
DATE_TO DATE,
MATERIAL NVARCHAR(20),
PRIMARY KEY (PROD_ID, DATE_FROM)
);
DROP TABLE DIM_SIZE;
CREATE COLUMN TABLE DIM_SIZE
(
PROD_ID CHAR(1),
DATE_FROM DATE,
DATE_TO DATE,
SIZE DECIMAL(12,2),
PRIMARY KEY (PROD_ID, DATE_FROM)
);
INSERT INTO DIM_MATERIAL VALUES ('A', '2014-01-01', '2014-01-08', 'Steel');
INSERT INTO DIM_MATERIAL VALUES ('A', '2014-01-14', '2014-01-24', 'Aluminum');
INSERT INTO DIM_MATERIAL VALUES ('A', '2014-01-25', '2014-01-31', 'Carbon Fiber');
INSERT INTO DIM_SIZE VALUES ('A', '2014-01-11', '2014-01-15', 10);
INSERT INTO DIM_SIZE VALUES ('A', '2014-01-16', '2014-01-19', 12);
DROP PROCEDURE CONSOLIDATE_SCD;
CREATE PROCEDURE CONSOLIDATE_SCD AS
BEGIN
-- expand DIM_MATERIAL to day-level granularity
dim_MATERIAL =
SELECT PROD_ID, DATE_SQL, MATERIAL
FROM _SYS_BI.M_TIME_DIMENSION TDIM
INNER JOIN DIM_MATERIAL DIMC ON
TDIM.DATE_SQL BETWEEN DIMC.DATE_FROM AND DIMC.DATE_TO;
-- expand DIM_SIZE to day-level granularity
dim_SIZE =
SELECT PROD_ID, DATE_SQL, SIZE
FROM _SYS_BI.M_TIME_DIMENSION TDIM
INNER JOIN DIM_SIZE DIMW ON
TDIM.DATE_SQL BETWEEN DIMW.DATE_FROM AND DIMW.DATE_TO;
-- 1) join on natural key fields + DATE_SQL
-- 2) retrieve non-null values of key fields and DATE_SQL
all_dims =
SELECT
COALESCE(MATERIAL.PROD_ID, SIZE.PROD_ID) AS PROD_ID,
COALESCE(MATERIAL.DATE_SQL, SIZE.DATE_SQL) AS DATE_SQL,
MATERIAL,
SIZE
FROM :dim_MATERIAL MATERIAL
FULL OUTER JOIN :dim_SIZE SIZE ON
MATERIAL.PROD_ID = SIZE.PROD_ID AND
MATERIAL.DATE_SQL = SIZE.DATE_SQL;
-- re-calculate validity dates
result =
SELECT
MIN(DATE_SQL) AS VALID_FROM,
MAX(DATE_SQL) AS VALID_TO,
MATERIAL,
SIZE
FROM
:all_dims
GROUP BY
MATERIAL,
SIZE;
SELECT *
FROM :result
ORDER BY VALID_FROM;
END;
CALL CONSOLIDATE_SCD;
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
6 | |
5 | |
5 | |
5 | |
5 | |
4 | |
4 | |
4 | |
3 | |
3 |