Calculation Engine (CE) Functions – Rest in Peace
I remember my initial encounters with HANA extremely vividly. One of them was a conversation with Paul Barker, who was an early HANA addict. We discussed building SQL models in HANA, and then the benefits of HANA Information Views. He was wide-eyed about the benefits of using HANA CE Functions, purporting insane performance increases.
What are CE Functions?
HANA Calculation Engine Functions (or CE Functions) are a mechanism introduced in HANA SPS02, which allow direct access to the HANA Column Store. They allow extremely precise control over how precise columns are projected, calculated and aggregated. In fact, they are a direct representation of how the original HANA Column Store modeled vector functions.
Why were CE Functions so fast?
They completely bypassed the SQL optimizer and meant that whatever you wanted to executed was run in the column store, and you always got the full performance of HANA. In the early days of HANA, SQL could be extremely inefficient and cause HANA to materialize large volumes of data in the SQL Engine, in row format. This caused big memory bubbles and 100-200x slower performance.
By comparison, CE Functions always ran in the column engine, and provided you were returning modest result sets, were highly efficient. They actually effectively compile into a language called L, which was HANA’s interim language, and is very similar to C++.
In 2011 and 2012, I very frequently coded high-performance aspects of a HANA solution in CE Functions.
What were the downsides to CE Functions?
CE Functions are part of the HANA SQLScript language, and as such they are an extension of ANSI SQL. That’s neat, but it also means that there aren’t many spare operators, and the syntax could get really funky. Sometimes you need a single quote (‘), sometimes double quotes (“). Sometimes double single quotes (”), or triple single quotes (”’). CE Functions are extremely picky!
What’s more, you have to specify every column, and with complex procedures, this means a lot of copy and paste, and activation of code. The editor wouldn’t give you much help, and the error messages were terse. In short, you have to be an expert SQL programmer and have a very good working knowledge of the HANA column store, to write good SQLScript with CE Functions.
But if you wanted performance, wow – this was HANA speed.
Along came the BW on HANA…
And so in 2012, along came “Project Orange”, or BW on HANA, with HANA SPS04. The HANA database had matured enough to work for SAP BW, but BW never used CE Functions. Instead, the BW team had their own proprietary way into HANA.
This soon turned out to be a limitation within BW, and the Calculation Scenario concept was born. Now, when you compile a BW object in BW 7.4, a HANA view is created, which can be accessed either via BW’s OLAP engine, or directly into the HANA database. The two are interchangeable, but CE Functions were way too restrictive for BW’s needs.
… shortly followed by Business Suite
Then in 2013 came Business Suite on HANA. The Business Suite makes extensive use of ANSI SQL via the OpenSQL interface, and it debuted with HANA SPS06. Due to the way the Business Suite is programmed, the HANA team had to make SQL much faster than it was before, and a ton of development went into the SQL Optimizer. As of HANA SPS06, there is often little different between SQL and HANA Information Views.
Let’s not forget HANA Live
HANA Live is programmed only in HANA Graphical Calculation Views. These had a habit of performing badly, if not designed very well, so the HANA development team put a ton of effort into making them perform great. They now compile into calculation scenarios, which are a XML representation of columnar projections and calculations.
What happened to CE Functions along the way?
Over half of SAP’s HANA customers use BW, and Business Suite is the fastest growing product by go-lives. Nether of these make any use of CE Functions and so they got extremely limited development attention and few new features.
Coupled with the fact that they are inflexible and hard to use, the other mechanisms – SQL, and Graphical Calculation Views, got all the attention. CE Functions didn’t get any further optimizations, and stayed the same since HANA SPS05. They didn’t get worse – the other options just got better and better.
As of HANA SPS07, we found that Graphical Calculation Views were always faster than CE Functions. In HANA SPS08, we find that SQL, Analytic Views and Graphical Calculation Views all have exactly the same explain plans in many cases. They all perform the same and it’s just a question of preference how you design things.
In fact, extremely complex views can be created as cascaded Calculation Views, and the view compiler will collapse and simplify the views and create a single optimized set of column projections, calculations, vector joins and unions as a single Calculation Scenario. This is truly the jewel in HANA’s crown.
CE Functions – Rest in Peace
The CE Function is an important part of HANA’s past, but there is no longer a reason to use them. They are less flexible, slower, and harder to program. We expect that in a forthcoming release of HANA, they will be deprecated and eventually de-supported.
The only time I use CE Functions is when the HANA SQL optimizer has a poor execution plan. Because CE Functions and SQL don’t share engines, it’s possible to control the HANA SQL optimizer using CE Functions. That’s an extremely unusual scenario these days.
Do note that the Scripted Calculation View still has its place – you can call out to Business Functions, Predictive Functions and other stored procedure objects in SQLScript. However the days of this are limited, because in HANA SPS09 it’s now possible to do some of this via the Graphical Modeler. I expect by HANA SPS10 or 11, the Scripted Calculation View will no longer be required.
So whilst they haven’t been officially deprecated yet, please celebrate their passing, and don’t use them any more.
Stellar post, John - thanks for sharing!
Although I disagree on the timeline of the history, I'd subscribe to the message:
Stop thinking CE-functions = fast, everything else = slow, folks.
Use SQL or graphical views and enjoy that your design has at least a chance of being intelligible to some later maintainer (remember, it maybe you π )
Ha, yeah I thought you might! Happy to correct them if there are factual errors. Some of the timeline is of course subjective.
Well, to be fair I think they don't change the point that you made very well.
Some random points:
Anyhow, since CE functions and the calculation scenario in total doesn't fit well with SQL relational algebra it's (folks might remember the confusing effects of count(*)) a really good choice to not use them if not required.
"HANA Live is programmed only in HANA Graphical Calculation Views"
I would add that most of the graphical calculation views delivered with SAP HANA live are configured to execute in the SQL Engine. So it would appear (just a guess) that SAP has found the SQL optimizer to be a better choice when de-normalizing ECC tables.
Yes, the OLAP engine works better with fewer, larger, tables with unions and some joins. The SQL engine works well with many complex joins. The HANA development team did a lot of optimization!
I won't really miss these - I never found them to perform any better than some of their modeling counterparts, and as you said - syntax was not easy!
Oh they definitely performed better in early versions of HANA. Go and install SPS02/03 if you don't believe me π
Oh, I do believe you - I wasn't around in the "good old days" to see it. By the time I was addicted to HANA, SPS04 was out and the playing field must have been leveled by that time π
In SPS04, there were still times when they could kill the graphical modeler, but it was already just 1 in 10 scenarios. I haven't seen a scenario since SPS06 where it made any difference at all.
Well, I also never had a benefit from CE.
To be true I never participate in simple dashboard or OLAP projects, so maybe that's the reason.
On SP02 the best usage we realized using CE is to "cut" the SQL optimizer forcing the materialization of some SQL or CE instructions before other executions.... Without this break HANA tried to create a big access plan that sometimes got memory exausted.
But this was old times and it was necessary on that stage to infuence the optimizer.
And why's that?
I still believe that having the option to maintain calc views as SQL or graphical views should be the developer's decision, given that both have similar performances.
Also, with AFL SDK, where one could define a completely custom signature pattern to access their custom functions, you'd want to keep both options available.
Other use case is non-SQLScript code, such as R scripts.
Best,
Henrique.
Sure. I expect to be able to call out to SQLScript or R from the graphical modeler, inline. The Script-based modeler didn't get much attention since SPS05 and already in every case the graphical modeler is faster. Plus the Script-based modeler is fiddly, in respect to input parameters and output columns.
But as you say for now, the Script modeler provides functionality not in the graphical modeler. Once parity has been reached, I think we will lose the Attribute View, Analytic View and the Script-based modeler.
One modeler to rule them all π
>> in every case the graphical modeler is faster <<
I don't believe this to be true. In many of my experiences I have seen the Scripted Calc View outperform the graphical. Plus maintenance on the graphical view is a huge pain.
If I sense that a requirement is getting complex in any way I toss the GCV and starting with my var_out = π
Me too. Also I prefer the keyboard and ctrl C + ctrl V than the mouse and drag and drop.
Scripts that generate scripts
But I agree that an single graph for modeling is a good destination for graph approach specially as on each SP HANA make internal changes that "understand" modeling patterns and make best paths.
In some point on some I expect that Studio also can in design time "understand" some bad pratices modeler is doing guiding him directly to a hint or help (like ATC does on ABAP).
This brings a good point: the Modeler folks need to work on usability. It has gotten better in SPS06-09 but there are still a lot of little things to come. Good news is I think we will see good stuff here in SPS10 and 11.
That's true. It was a pain use the graph interface and now it works very well.
As I work with a 200ms of latency to the servers one important thing here is Studio do less roundtrips to the server... well I'm getting to side tracked here π just remembering my whish list to christmas π
What I do see happening is the creation of "script components" within the graphical modeler, effectively extinguishing the requirement for a separate scripted modeler. In all practical terms, it will be the same.
Best,
Henrique.
Yes exactly. Hopefully when they do this, they will do automatic type checking so you don't have to fiddle with explicit typing. That part of the UI is quite meh.
Yes exactly. Hopefully when they do this, they will do automatic type checking so you don't have to fiddle with explicit typing. That part of the UI is quite meh.
One modeler != one modeling paradigm
I do dig the concept of one single modeler interface that can access different engines (calc, olap, sql) based on the type of query, but I still believe that keeping both choices in the calc views would be rather beneficial to the developers.
As Justin pointed out, there will always be complex requirements that you could either model as several layers of graphical views or a single scripted view. They should ultimately compile to a very similar (if not identical) calculation scenario, and have similar performances, but drag-and-dropping or coding should be a choice the developers could make at their own discretion.
The reality is that HANA, like any software product, has a limited amount of development resources. In the last 2 years it means that certain areas have had all the effort. With a company like SAP that is sales-led, it means that the areas that sell well get the effort - hence the effort on SQL optimization, enterprise readiness and the Graphical Modeler.
The principle of HANA was to separate intent from optimization, and the two table types (row/column) and multiple engines go some way to prevent that. I suspect we will see a single table structure for row and columnar tables that will automatically choose the best structure for its data, plus a single modeler that will be auto-optimize.
Whilst some developers like code and some like drag/drop, in the end, maintaining two modelers is costly and reduces overall agility of the development team. I'd accept a better graphical modeler over the choice between graphical and code.
Well, scripted calc views are basically procedures with a result view, and HANA won't stop supporting procs. π So it's likely we will continue having both graphical & sql, and that they converge on compiling into similar scenarios.
That idea of a single logical metadata definition and abstraction of the underneath physical store is interesting. Do you see a single table with row and column partitions? That already kinds of exist underneath the delta buffers today, but maybe it would be nice to open that to hardcore DBAs.
Another thing worth mentioning : my first hana training ever was at SAP UK with Paul Barker. π
Showing your age Henrique π
"Alas, poor Yorick! I knew him well." π
John : you mentioned that Scripted Calc views might be phased out ?
But Scripted calc views has features like loops and case statements that's sometimes hard to model in graphical views. finally I have been using a quite a bit of windows functions for some of the scenarios - and many such features like windows functions are not available in graphical views.
Yes, I believe they will be phased out - replaced with the ability to place SQLScript in the graphical calc view as a projection or aggregation node.
Hopefully this will allow for things like automatic type enumeration and other niceties.
I have one question please let me know still we are using CE Functions in SP012 or not
Hi Naveena,
They are still supported in HANA 1 SPS12 but they are not recommended to be used. As is discussed in the article as well, it is better to go with Graphical CV or CV using Table function (and write the SQLScript there).
Thank you.
Best Regards,
Anjali.
Thanks for reply,it helped a lot
Please i have one more question,what are the pre_requisites to write SQL Script?
I presume the Currency conversion using the Calculation engineΒ is the only choice available. How can you avoid this inside AMDP/ SQL?