Transpose Row-set into Column Name – Dynamic Table Creation in SAP HANA 2.0 SPS04
Data preparation stage is the foundation for any advance analytics project using ML or predictive algorithms etc. During Predictive scenario implementation, there could be a requirement where row data need to be transposed as column name, to generate more columns. These additional columns are further processed by the predictive algorithms such as clustering, classification and others to improve prediction score.
Traditionally in reporting, we happen to transpose row values as column value or vice versa for an instance flatten hierarchy. But for data preparation phase in this predictive use case, we will do transpose of rows into column name (instead as column value). For better understanding, let me demonstrate this data preparation phase by referring below hypothetical machinery data of automobile manufacturer (in figure 1). In this data, for any ChassisNo value, EngineCapacity and Horsepower have 3 distinct value set such as 1100,2200,3500 and 74,163,270 respectively.
(Figure 1 shows different variant of Chassis number)
Let’s first list down various facts related above data as follows:
- For any ChassisNo, combination of EngineCapacity and HorsePower represents different variant. (for example, CIN100001 have 3 different variants with combination of EngineCapacity and Horsepower)
- For all Chassis, there exist 3 similar variants (which will grow in future to N variants)
- Column RPM, BodyType and Mileage represent attributes across different variants
Each chassis has 3 variants, thus for any chassis we can see 3 rows but in future this data is expected to grow with new variants launching across all chassis.
Target representation is expected as shown in below figure2. Here all Chassis, data is represented into single row and column RPM/BodyType/Mileage get merged with transposed value of variants.
(Figure 2 – Target requirement)
Following transformation can be understood as follows:
- Merged value of column EngineCapacity and HorsePower like (1100_74, 2200_163 and 3500_270) is transposed and append into column name such as RPM_1100_74 to Mileage_3500_270
- Attribute RPM/BodyType/Mileage to displayed across these target columns
As we can see here target columns generating primarily based on variant data, and this data is expected to grow in future due to new variants, thus to handle this dynamic data we opt creating SQLScript write procedure which creates structure of target table, where column name will be built dynamically based on number of variants and its value.
PSEUDO – CODE
Extract column name and its datatype (like RPM,BodyType and Mileage) from base table using system table TABLES_COLUMNS in a variable. For example, RPM and Nvarchar(50).
Extract variant data that is (EngineCapacity and HorsePower) value for anyone ChassisNo, which can be used to create part of target column names of table. For example, 1100_74.
Loop all column extracted in Step 1 and store its column name concatenated by ‘_’ (like RPM_) and its datatypes in respective variable like concatenated by ‘,’ (like Nvarchar(50), ).
Inner loop for variant data extracted in Step 2 followed by concat operation as follows:
Concat (Column, Variant Data) that is RPM_1100_74
Concat (i_weight_tmp, DATATYPE) that is RPM_1100_74 Nvarchar(50),
This inner loop iterates for all variant for column RPM extracted in step –1 which then pass control to outer loop and similar iterations continues for other column like BodyType and Mileage.
Hold this dynamically generated string (comprising target column names) into a variable and perform below operation.
v_weight_f := Concat (‘ChassisNo Nvarchar(50),’ variable holding string of target column)
v_weight:= (‘CREATE table TargetChasis (‘||:v_weight_f||’)’)
With usage of EXEC clause (of Dynamic SQL), table can be created on-the-fly with all its metadata defined as per data in base table.
Detailed code used for creating table dynamically can be referred in figure 3 as follows:
(Figure 3. Sample code for creating table dynamically)
Upon execution of above procedure, column table is created successfully.
Note: According to existing system limitation mentioned in SAP Note 2154870 , a column table in SAP HANA database can accommodate maximum 64,000 column in column store versus 1000 in row store table, thus this dynamic creation of table will be valid for column count with in this range.
At this point, dynamically built table is ready now and can be inserted with data from RPM, BodyType and Mileage column.
For any similar predictive requirement, where row data required to be transposed as column name, this data preparation approach is one of solution which can dynamically build table (using HANA SQLScript procedure) based on data in base table. Data can be dynamically loaded in this table by another procedure and can be further consumed in different algorithm of Predictive Analysis Library (PAL)in HANA for advance analytics purposes.
Thanks & Regards,
LinkedIn : https://www.linkedin.com/in/ashish-jain-260bb517/
While this does demonstrate a technique, I wonder if there's a need for it. If more data is loaded into the source table you have to recreate a new reporting table? That doesn't fit any BI model I've ever seen, and all the BI and analytic tools I've used like tables with fixed structures.
Is this from a real business requirement?
Thanks a lot for taking time to review this blog. Yes, it’s a rare reporting scenario like flatten hierarchy in BW but a real business requirement implemented for predictive scenario. Above blog demonstrate a data preparation stage in which row value is transposed to column name and output of this transpose table is used in PAL algorithm to identifying key influencers column (which is combination of two column such as RPM_1100_74,MILEAGE_3500_270 or others)for Chassis performance in this use case.
Very clear steps .Thanks
Dear Ashish ,
Can you help me to achieve below requirement ?
Have a requirement to Transpose Row to Column, it would be great if someone share solution or workaround to achieve this using calculation view via graphical method or Dynamic SQL
Above can be achieved in calculation view by creating restricted column like VV001..VV025 , with base measure as AMOUNT and attribute restriction on VALUE_FIELD.
Thanks for your reply, in real time i may have more than 26 VALUE_FIELD and 5 Measure columns, this case i need create more calculated columns 26*5=130, Is there any other way to transpose dynamically ?
A very interesting blog with detailed steps to achieve a unique requirement.