Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Introduction

Data preparation stage is the foundation for any advance analytics project using ML or predictive algorithms etc.  During Predictive scenario implementation, there could be a requirement where row data need to be transposed as column name, to generate more columns. These additional columns are further processed by the predictive algorithms such as clustering, classification and others to improve prediction score.

Traditionally in reporting, we happen to transpose row values as column value or vice versa for an instance flatten hierarchy. But for data preparation phase in this predictive use case, we will do transpose of rows into column name (instead as column value). For better understanding, let me demonstrate this data preparation phase by referring below hypothetical machinery data of automobile manufacturer (in figure 1). In this data, for any ChassisNo value, EngineCapacity and Horsepower have 3 distinct value set such as 1100,2200,3500 and 74,163,270 respectively.



(Figure 1 shows different variant of Chassis number)

Let’s first list down various facts related above data as follows:

  • For any ChassisNo, combination of EngineCapacity and HorsePower represents different variant. (for example, CIN100001 have 3 different variants with combination of EngineCapacity and Horsepower)

  • For all Chassis, there exist 3 similar variants (which will grow in future to N variants)

  • Column RPM, BodyType and Mileage represent attributes across different variants


Each chassis has 3 variants, thus for any chassis we can see 3 rows but in future this data is expected to grow with new variants launching across all chassis.

Requirement

Target representation is expected as shown in below figure2. Here all Chassis, data is represented into single row and column RPM/BodyType/Mileage get merged with transposed value of variants.


(Figure 2 – Target requirement)

Following transformation can be understood as follows:

  • Merged value of column EngineCapacity and HorsePower like (1100_74, 2200_163 and 3500_270) is transposed and append into column name such as RPM_1100_74 to Mileage_3500_270

  • Attribute RPM/BodyType/Mileage to displayed across these target columns


Solution

As we can see here target columns generating primarily based on variant data, and this data is expected to grow in future due to new variants, thus to handle this dynamic data we opt creating SQLScript write procedure which creates structure of target table, where column name will be built dynamically based on number of variants and its value.

PSEUDO CODE

Step 1

Extract column name and its datatype (like RPM,BodyType and Mileage) from base table using system table TABLES_COLUMNS in a variable. For example, RPM and Nvarchar(50).

Step 2

Extract variant data that is (EngineCapacity and HorsePower) value for anyone ChassisNo, which can be used to create part of target column names of table. For example, 1100_74.

Step 3

Loop all column extracted in Step 1 and store its column name concatenated by '_' (like RPM_) and its datatypes in respective variable like concatenated by ',' (like Nvarchar(50), ).

Step 4

Inner loop for variant data extracted in Step 2 followed by concat operation as follows:

Concat (Column, Variant Data) that is RPM_1100_74

Concat (i_weight_tmp, DATATYPE) that is RPM_1100_74 Nvarchar(50),

This inner loop iterates for all variant for column RPM extracted in step –1 which then pass control to outer loop and similar iterations continues for other column like BodyType and Mileage.

Step 5

Hold this dynamically generated string (comprising target column names) into a variable and perform below operation.

v_weight_f := Concat ('ChassisNo Nvarchar(50),' variable holding string of target column)

v_weight:= ('CREATE table TargetChasis ('||:v_weight_f||')')

EXEC v_weight

With usage of EXEC clause (of Dynamic SQL), table can be created on-the-fly with all its metadata defined as per data in base table.

Detailed code used for creating table dynamically can be referred in figure 3 as follows:

(Figure 3. Sample code for creating table dynamically)

Upon execution of above procedure, column table is created successfully.

Note: According to existing system limitation mentioned in SAP Note 2154870 , a column table in SAP HANA database can accommodate maximum 64,000 column in column store versus 1000 in row store table, thus this dynamic creation of table will be valid for column count with in this range.

At this point, dynamically built table is ready now and can be inserted with data from RPM, BodyType and Mileage column.

Conclusion

For any similar predictive requirement, where row data required to be transposed as column name, this data preparation approach is one of solution which can dynamically build table (using HANA SQLScript procedure) based on data in base table. Data can be dynamically loaded in this table by another procedure and can be further consumed in different algorithm of Predictive Analysis Library (PAL)in HANA for advance analytics purposes.

 

Thanks & Regards,

Ashish Jain

LinkedIn :  https://www.linkedin.com/in/ashish-jain-260bb517/
8 Comments