SAP HANA SQL for bizy geeks: Episode #1: The importance of being TINY, Data Types and other beasts
SAP HANA SQL in itself is no rocket science. It strongly gravitates around standard SQL-92, with a couple of extensions here and there.
If I had to decide about my favorite SQL primitive data type, without giving it much more intellectual thought than I already did, I would choose the cute and light 8-bit store unsigned integer, wearing the not-really tiny, little pleonastic name “TINYINT” (please, note the freestyle usage of double quotes for non-identifiers).
If you do the math, 2 to the power of 8 is 256, that is, a TINYINT can assume the values from 0 to 255, which should be enough for columns containing integer data such as, for example, people’s ages.
I don’t think a Y2K-similar issue is expected for the next couple of hundred years regarding human longevity. Maybe the kids born today will have a chance to live 100-110 years on average (although I am not yet convinced about this statistical extrapolation), but I do not expect us to survive much longer than that, at least before a new age of revolutionary Hardware and Software Migration waves happens spontaneously. It would be interesting to do a global programmer’s statistics study (maybe there is already more than one) and calculate the percentage of developers taking the time to carefully analyze the minimum/maximum amount of storage required for the different data fields for a software application.
My hypothesis is that many developers will tend to take slightly “oversized” data types, just in case. I also tended to do that myself some years ago when I was a real developer, but then again, I was not a brilliant Wozniak. I must also admit that 10 to 15 years ago I was naive enough to think that the Club of Rome was exaggerating a bit in 1972.
During IT prehistory, people had to save data storage for necessity. Nowadays people may think they can afford to be profligate and extravagant with RAM, Disk and CPU power, because “capacity is going up and prices are going down” all the time.
There is no need to be extremely ecologically intelligent to see the catch in this reasoning, even though we are bombarded with this kind of statements from everywhere all the time.
And now I will temporarily get back to purely robotic-thinking, even though I am not finished with all I wanted to say about the topic above, and will keep developing this line of thought further in the near future.
Let me give you a very compact summary about the main SAP HANA SQL Data Types, just for the (robotic?) fun of it:
- For Date and Time the winners are:
- DATE (with range of 0001-01-01 to 9999-12-31)
- TIME (HH24-MI-SS)
- SECONDDATE (0001-01-01 00:00:01 to 9999-12-31 24:00:00)
- TIMESTAMP (0001-01-01 00:00:00.0000000 to 9999-12-31 23:59:59.9999999)
- For Numeric Types, we start with…
- TINYINT (8-bit unsigned integer)
- continue with SMALLINT (16-bit signed integer)
- INTEGER (32-bit signed integer)
- BIGINT (64-bit signed integer)
- DECIMAL (precision von 1 to 34, scale from -6000+ to +6000+). Example: 31416 x 10-4 has the precision 5 and the scale 4.
- SMALLDECIMAL (floating-point decimal supported only for Column Store with precision up to 16 and scale around 369)
- REAL (32-Bit floating point)
- DOUBLE (64-bit floating point)
- and we will finish with FLOAT(n), a 32-bit or 64-bit REAL, with n significant bits up to 53. Steve Jobs would never have dreamt about such variety of floating points 20+ years ago. A real orgy!
- For Character Strings we have the following:
- VARCHAR(n) for ASCII characters, with n up to 5000
- NVARCHAR(n) for Unicode character sets
- and ALPHANUM(n) with a much smaller maximum length “n” than for VARCHAR, that is, 127. My memory is playing a trick in my mind because my inner eye is seeing a non-variable-length “CHAR” type. At least there is a CHAR in other SQL versions, but I don’t find it in the SAP HANA documentation. Since my SQL proficiency is currently like my French, that is, not awfully bad, but not always fluent, unless I practice it live for a couple of weeks in a row, I cannot say why CHAR has been killed, IF it has been really killed. The reason may lie in some architecture efficiency thing, for good or bad.
- We have VARBINARY(n) for Binary Types.
- And we surely have Large Object (LOB) Types like…
- BLOB for large binary data
- CLOB for ASCII data
- and NCLOB for Unicode data. You are not allowed to do everything with LOBs, like for example, ORDER BY or GROUP BY or use them as PRIMARY KEY. To be honest, I would not use a YouTube Video as PRIMARY KEY, even if I could. But I might change my mind, like all wise people.
That was it about SAP HANA SQL Data Types on my side… And now I have to go. I need to recreate some users in the SAP HANA instance I will be using tomorrow for the next training, create a couple of variables for some Analytic Views, in order to show off a bit (although currently, this “variables business” with the SAP HANA Studio does not give you the greatest user experience in the world, to put it very diplomatically), and I also want to observe the behavior of my new royal Analytic Privileges, of course on the sly, and without cameras. Last but not least, I have to prepare lunch… collaboratively. I am hungry after so much discussion about storage and resources, but we like mediterranean cuisine, which tends to be lean(er) than others.
Co-Founder and COO