Skip to Content

Have you been using the ISNUMERIC() function of MySQL, MS SQL Server or PostgreSQL and wondered what to use in SAP HANA instead?
If so, this is the blog for you.

Just recently I’ve seen a customer using my old workaround (first published here) and then a few days later the related stackoverflow question got updated.

This topic is not dead

The new answer to the SO discussion took a popular approach to the IS_NUMERIC problem:

If the string can succesfully be converted into a numeric type, then the value is numeric (IS_NUMERIC returns 1 for TRUE), else it’s not (IS_NUMERIC returns 0 for FALSE).
This is a completely feasible approach (and used on other platforms as well, e.g. Oracle or PostgreSQL) and provides a result that allows to make the decision of whether or not the tested value can be turned into a number data type in HANA.

IT DEPENDS, as usual

The problem, or rather the limitation of this approach is, that numeric can mean different things in different circumstances.
Sometimes, one might want to accept ‘123.456,–‘ as numeric. Sometimes “$500.00 –” shoult be OK, and so on.
Checking the T-SQL definition of ISNUMERIC (here and here) shows that there are plenty other options.

This means nothing else that there are alternative approaches for this problem.
One of those alternatives could be to use regular expressions (regex), which have been supported by HANA for quite some time now (since SP10 if I’m not mistaken).
With tools like https://regex101.com/ one can comfortably work out a pattern that matches or doesn’t match the kind of data one is interested in.

It might well be, that in your application the question is not about whether a string or sub-string can pass IS_NUMERIC() but rather something more specific.
The customer that I mentioned earlier, for example, wanted to see if a string looks like an ICD10 code (one character followed by two digits, ‘B13’, ‘A00’, etc.). Snipping out the first character and putting the rest into an IS_NUMERIC() function is not optimal here.

Instead, the regex ‘^\D\d{2}\.’ can be used to capture the whole pattern.

Show me how it’s done

Let’s take a rather simple regex that matches any string that

  • is not a number between 0-9
  • is not a plus (+), minus (-), comma (,) or full stop (.)
  • is not an exponent ‘E’ (as in 1.23E02)

Such a regex is ‘[^0-9\-\.\,\+e]’ (check here) with flags for global, case-insensitive matching.

In SAP HANA SQL we can use this regex in the LOCATE_REGEXPR() function. If it doesn’t find any matching string, then the result is 0, else it will return the location of the first matching character.
Since we are only interested in those cases, where no matches where found, we can easily MAP this output to make up a IS_NUMERIC value:
1 = no unwanted characters found, it’s numeric
0 = unwanted characters found, it’s not numeric

MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                    FLAG 'i' 
                    IN <string to be searched>)
     , 0, 1
     , 0)

So, what’s better now?

Using the “try-the-conversion“-approach can only be done via a procedure or a scalar function. This brings some overhead with it for the function invocation which can be avoided, when using the regex directly in the SQL statement.

As an example, I ran a quick test against 10.000.000 unique values (all tests are done against HANA1 1.00.122 and HANA2 2.00.020).

Conversion-approach function

select 
       sum("IS_NUMERIC_CONV"(id_char) )
from 
       "DEVDUDE"."DIM10000000"
where 
       "IS_NUMERIC_CONV"(id_char) = 1
   and "IS_NUMERIC_CONV"(id_char) IS NOT NULL;

This ran for roughly 19.5 seconds with 8 CPU cores at 100% usage.
And this is the runtime when all values actually can be converted correctly. With the same number of non-numeric values, the runtime goes up to 2:45 mins.

Regex-approach pure SQL

select 
       SUM (IS_NUM_REGEX)
from 
     (select 
             MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                                 FLAG 'i' 
                                 IN id_char)
                  , 0, 1
                  , 0) IS_NUM_REGEX
       from 
            "DEVDUDE"."DIM10000000")
where 
     IS_NUM_REGEX = 1 
 and IS_NUM_REGEX IS NOT NULL;

This ran for roughly 8.6 seconds with 8 CPU cores at 100% usage. Same runtime for both convertible and non-convertible values, here.

Too good to be true

By now you maybe think: “Gee, that regex thing is a lot faster, but also really ugly code. Let’s put that into a function and have our cake and eat it too!“.

Off we go:

FUNCTION "DEVDUDE"."sandpit::IS_NUMERIC_REGEX" (
                 in stringToCheck nvarchar(5000)) 
         RETURNS isNumeric integer
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER AS
BEGIN
     isNumeric := MAP (locate_regexpr(START '[^0-9\-\.\,\+e]' 
                                      FLAG 'i' 
                                      IN :stringToCheck)
                       , 0, 1
                       , 0);
END;

A statement using this function takes roughly 30 seconds with 8 CPU cores at 100% usage.

No cake for us!

As long as all our input data is clean, the conversion-function approach works nicely but really becomes expensive, when there are many cases, for which the exception handling needs to be executed. The lesson here is probably that it’s not a great (performance) idea to make exception handling part of your main processing path.

A truly high performing solution would need to be implemented as a native HANA function, so for it needs to be carefully checked what exactly the job of IS_NUMERIC() should be in your application and how many matches/non-matches are expected.

There you go, now you know!

Cheers,

Lars


This blog was first published on LBREDDEMAN.ORG

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply