Skip to Content
Technical Articles
Author's profile photo Mark Gearhart

Using Point Characteristic Functions to Pivot Data in HANA

There are several articles in this blog space dealing with Table Pivoting, A recent article, universally titled Pivoting Data with SAP HANA, demonstrates a Python programming interface which uses Data Frames and the hana_ml machine learning library. However, this is not a method which can be used in the context of the SQL language, and therefore it is very limited in scope.

There are two other universally titled articles PIVOT Data in HANA and How to pivot dataset in SAP HANA which give an SQL solution, but they seem either too wordy or too unscientific for something that I think is really just a simple math problem.

There are also several older, archived articles which I feel are much too vague or overly procedural to be universally titled as the all-encompassing solution.

I recently ran across a table pivoting problem in my own application, and needed a solution in the form of an SQL select statement. If you are part of the older engineering crowd, then you may have run across this book which was published 25 years ago:

The authors devoted the entire book to a collection of essays on SQL programming techniques, many of which have been lost to history. This book presents an innovative methodology for SQL programming based on the use of Point Characteristic Functions.

Point characteristic functions are devices that allow engineers to encode conditional logic as scalar expressions within select, where, group by and set clauses. These techniques implement single statement solutions to problems that normally require multiple SQL statements. They also minimize the number of passes through underlying tables, and use SQL to obtain results not otherwise available within a relational implementation.

With an understanding of the old SQL-92 mechanics described in the book, I re-implemented their point characteristic functions using features available in the newer SQL used by HANA.

The secret to turning, or pivoting, long narrow tables into short, wide ones, lies in the proper encoding and use of these point characteristic functions. There are a wide variety of solutions enabled by these and other characteristic functions. The need for such pivoting occurs frequently in practice, primarily because, while the narrow table representation is better for data manipulation, the wide table form can be better for presenting data to reports.

We will get right to the problem I faced, and then explain the details. The application takes a set of Bank Deposit Rates for a number of months, stored in a long narrow table called DepositRates, and pivots the data to present the rate as columns by month:

Here is the input data

 


create table DepositRates (
id         int           not null,
month      int           not null,
rate       decimal(6,6)  null,
resetDate  date          null,
resetNote  nvarchar(50)  null);
 
create table Instruments (
id         int           not null,
name       nvarchar(8)   not null,
source     nvarchar(8)   not null);

 


insert into DepositRates values(1,1,0.008000,to_date('10/02/2019','MM/DD/YYYY'),'late reset');
insert into DepositRates values(1,2,0.025500,to_date('11/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(1,3,0.014100,to_date('11/29/2019','MM/DD/YYYY'),'early reset');
insert into DepositRates values(2,1,0.007115,to_date('10/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(2,2,0.007159,to_date('11/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(2,3,0.007135,to_date('12/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(3,1,0.022441,to_date('10/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(3,2,0.020109,NULL,'missing reset');
insert into DepositRates values(3,3,0.017743,to_date('12/01/2019','MM/DD/YYYY'),NULL);
 
insert into Instruments values(1,'HKDOND=','Reuters');
insert into Instruments values(2,'GBP3MD=','Euribor');
insert into Instruments values(3,'USD1MD=','Reuters');

 


select * from Instruments;
 
ID  NAME     SOURCE
--  -------  --------
 1  HKCOMD=  Reuters
 2  GBP3MD=  Euribor
 3  USD1MD=  Reuters

 


select * from DepositRates;
 
ID  MONTH  RATE      RESETDATE   RESETNOTE
--  -----  --------  ----------  ----------
 1      1  0.008000  2019-10-02  late reset
 1      2  0.025500  2019-11-01  NULL
 1      3  0.014100  2019-11-29  early reset
 2      1  0.007115  2019-10-01  NULL
 2      2  0.007159  2019-11-01  NULL
 2      3  0.007135  2019-12-01  NULL
 3      1  0.022441  2019-10-01  NULL
 3      2  0.020109  NULL        missing reset
 3      3  0.017743  2019-12-01  NULL

 

Here is the pivot query

 


select
  i.name "name",
  i.source "source",
  max(r.rate*(nullif((1-abs(sign(r.month-1))),0))) "current rate",
  max(r.rate*(nullif((1-abs(sign(r.month-2))),0))) "period-1 rate",
  max(r.rate*(nullif((1-abs(sign(r.month-3))),0))) "period-2 rate"
from DepositRates r, Instruments i
where r.id = i.id
group by i.name,i.source
order by i.name;

 

And here is the result

 


name     source   current rate  period-1 rate  period-2 rate
-------  -------  ------------  -------------  -------------
GBP3MD=  Euribor      0.007115       0.007159       0.007135
HKDOND=  Reuters      0.008000       0.025500       0.014100
USD1MD=  Reuters      0.022441       0.020109       0.017743

 

The mechanics

As you can see, we have pivoted the table by flipping the rates stored as rows for each instrument, into columns. We pivot by a numeric month and present a numeric rate. Note that the expression (nullif((1-abs(sign(month-1))),0)) acts as a point characteristic function for month = 1. Specifically, it returns 1 if the month is equal to 1 and returns NULL otherwise. The function sign() returns -1,0,+1 for negative numbers, zero and positive numbers respectively; the function abs() returns the absolute value of its argument; and the function nullif() returns NULL if the value is 0. Consequently, only one, the first, of the three rate values participating in the max() aggregate retains its original value, with the other two being reduced to NULL. Thus, the expression max(rate*(nullif((1-abs(sign(month-1))),0))) in effect simply returns the month 1 rate amount, precisely as required. The expression max(rate*(nullif((1-abs(sign(month-2))),0))) acts in a similar way, implementing a characteristic function for month 2.

Breaking up the query, we can see the return value for the point characteristic function as follows:


select
  id,
  nullif((1-abs(sign(month-1))),0) as v1,
  nullif((1-abs(sign(month-2))),0) as v2,
  nullif((1-abs(sign(month-3))),0) as v3
from DepositRates
order by id,month;

 


ID  V1    V2    V3
--  ----  ----  ----
 1     1  NULL  NULL
 1  NULL     1  NULL
 1  NULL  NULL     1
 2     1  NULL  NULL
 2  NULL     1  NULL
 2  NULL  NULL     1
 3     1  NULL  NULL
 3  NULL     1  NULL
 3  NULL  NULL     1

Then, we apply the function to the presentation attribute called rate by multiplying it by the function result. Here is the result:


select
  id,
  rate*nullif((1-abs(sign(month-1))),0) as v1,
  rate*nullif((1-abs(sign(month-2))),0) as v2,
  rate*nullif((1-abs(sign(month-3))),0) as v3
from DepositRates
order by id,month;

 


ID  V1        V2        V3
--  --------  --------  --------
 1  0.008000      NULL      NULL
 1      NULL  0.025500      NULL
 1      NULL      NULL  0.014100
 2  0.007115      NULL      NULL
 2      NULL  0.007159      NULL
 2      NULL      NULL  0.007135
 3  0.022441      NULL      NULL
 3      NULL  0.020109      NULL
 3      NULL      NULL  0.017743

When max() and group by are applied, you can probably see that the final aggregate result is returned as expected.

Other variations

In the above example, we pivoted by a numeric data type and likewise presented a numeric data type. If you imagine that other combinations are possible, your observation is correct. We can pivot and present by any combination of numeric, date or character data types.

Here is a date data type presentation of resetDate. The add_days() function is applied to a point characteristic function that returns either 1, which keeps resetDate, or NULL, which discards resetDate. We rely on the behavior of add_days() to return NULL if the date adder is NULL. I used add_days() but could have used some other date function as well:


select
  i.name "name",
  i.source "source",
  max(add_days(r.resetDate,1-nullif((1-abs(sign(r.month-1))),0))) "current reset",
  max(add_days(r.resetDate,1-nullif((1-abs(sign(r.month-2))),0))) "period-1 reset",
  max(add_days(r.resetDate,1-nullif((1-abs(sign(r.month-3))),0))) "period-2 reset"
from DepositRates r, Instruments i
where r.id = i.id
group by i.name,i.source
order by i.name;

 


name     source   current reset  period-1 reset  period-2 reset
-------  -------  -------------  --------------  --------------
GBP3MD=  Euribor  2019-10-01     2019-11-01      2019-12-01
HKDOND=  Reuters  2019-10-02     2019-11-01      2019-11-29
USD1MD=  Reuters  2019-10-01     NULL            2019-12-01

Here is a character data type presentation of resetNote which relies on the behavior of substring() to return NULL if the starting position is NULL:


select
  i.name "name",
  i.source "source",
  max(substring(r.resetNote,nullif((1-abs(sign(r.month-1))),0),length(r.resetNote))) "current note",
  max(substring(r.resetNote,nullif((1-abs(sign(r.month-2))),0),length(r.resetNote))) "period-1 note",
  max(substring(r.resetNote,nullif((1-abs(sign(r.month-3))),0),length(r.resetNote))) "period-2 note"
from DepositRates r, Instruments i
where r.id = i.id
group by i.name,i.source
order by i.name;

 


name     source   current note  period-1 note  period-2 note
-------  -------  ------------  -------------  -------------
GBP3MD=  Euribor  NULL          NULL           NULL
HKDOND=  Reuters  late reset    NULL           early reset
USD1MD=  Reuters  NULL          missing reset  NULL

The code also works properly when presented with missing data rows. If we assume that the row for instrument 1 for month 1 is missing, the max() aggregate corresponding to month 1 for instrument 1 would be comprised of two components, all of which, not belonging to month 1, would be NULL. The max() aggregate would then return NULL as the result, exactly as desired. The code also works properly if any values themselves are NULL, in exactly the same manner.

Folding, which is the inverse of pivoting, is also described in the book as an application of point characteristic functions. For now though, the immediate problem which I faced is now solved. If you have an interest in this subject, then I highly recommend finding a copy of the book.

Cheers.

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.