Convert DF34_RAW to Decimal in SQL

maxnuf · ‎02-18-2019

This is a poor man's implementation to read DF34_RAW datatypes in SQL.

Introduction

For building BI reports in native HANA for SAP's ACM module, we ran into a lot of tables that were using decimal 128-bit floating point numbers stored in DF34_RAW format. As this is a native ABAP type, it cannot be read directly in SQL.
We were already building a lot of calculation views and reports in HANA, so we had to have a way to get the values from these fields.
Discussions with SAP support did not lead to a solution, so I set out to find an alternate native SQL approach.

We've been using this approach for some time now, so I thought it was good to share what we came up with.

Binary structure

When I started, I naively tried to see if DF34_RAW has the same inner workings as https://en.wikipedia.org/wiki/Decimal128_floating-point_format.

Although the structure looks similar, the representation is not.
What follows is my understanding of the format for our use-case. Don't expect this to be complete or 100% correct.

Sign	combination	significand
1 bit	17 bits	110 bits
s	mmmmmmm mmmmmmmmmm	cccccccccc cccccccccc cccccccccc cccccccccc cccccccccc cccccccccc cccccccccc cccccccccc cccccccccc cccccccccc cccccccccc

Here:

s denotes the sign-bit

cccccccccc are 10-bit declets that can just be read as an integer with maximum of 999. (For negative sign-bit, values are inverse counted. So 999 - value)

Sign declet value

1 0000000000 000

1 0001101100 083

1 1111100111 999

0 0000000000 999

0 1111100111 000

mmmmmmm mmmmmmmmmm contains both the most significant digit and the exponent value. This combined part can be read as an integer in a similar fashion as the declets.

In the combined part, we can look at the first digit to find the most significant digit. (For negative sign-bit, it is 9 - value)

For the exponent, we ignore the first digit and subtract 6176 (Since 128-bit floating point numbers support a range of ±0000000000000000000000000000000000×10^−6176 to ±9999999999999999999999999999999999×10^6111).
For the negative sign-bit, the exponent is a bit different.

Sign combination int value exponent msd value

1 0111101 1011111101 61765 0 5 5

1 0111101 1100000011 61771 1 1 10

1 0111101 1100001111 61783 2 3 300

Challenges

To implement this in SQL however, we have two challenges:

we need to individually select bit-ranges

we need to convert those bit-ranges to integer values.

Challenge 1 can be overcome by masking bit fields using BITAND function, converting the binary value to hexadecimal VARCHAR's via BINTOHEX, and then selecting ranges using SUBSTRING.
Since we use hexadecimal values, we select a multiple of 4-bits each time. The best we can do is select 12-bits instead of 10-bit for each declet, so some of our ranges are 2-bits misaligned. This means we need to bit-shift some of the values (or simply divide by 4).

Challenge 2 got a bit harder, since we now have hexadecimal strings instead of binary values. No native function exists for this. The blogpost https://archive.sap.com/discussions/thread/3652555 tried to solve this with a scalar UDF function using a while loop. Although it works, the performance and especially the intermediate memory allocation are terrible. So instead, I opted for something that databases are good at: lookup tables.
For our purpose we only have 1000 distinct values anyway.
In our lookup table, we can also tackle the issue of bit-shifting.

Implementation

Initially I wanted something modular and reusable, so I built several scalar UDF functions that could be called for any column containing DF34_RAW values. However this turned out to be pretty bad for performance. Each UDF allocated several Mb of memory for every line item it worked on. For some large datasets, I noticed that several TB of memory were being allocated and deallocated during execution. Rewriting the logic using lookup tables and simple left joins made things easier and ensured performance isn't too bad.
The downside is that it needs to be implemented for each table and for each source field.

In the end my logic is to split the DF34_RAW field into 14 separate fields, then perform 13 joins to lookup values and finally bring it together in a string format that can be passed to the TO_DECIMAL function, like to_decimal(1.231e2).

Instead of cluttering this post with hard to format code, the entire example can be found at: https://github.com/maxnuf/df34_raw_convesion.

I still hope that some day a native conversion function in HANA will be made available. In the meantime I hope this implementation can help out others that need to read DF34_RAW values using SQL.

I know the implementation is not perfect, so ideas and suggestion to improve it are welcome.