Developer's Journal: First Steps into the SAP HANA...

thomas_jung · ‎02-23-2012

Introduction

A long time ago when I first started blogging on SDN, I used to write frequently in the style of a developer journal. I was working for a customer and therefore able to just share my experiences as I worked on projects and learned new techniques. My goal with this series of blog postings is to return to that style but with a new focus on a journey to explore the new and exciting world of SAP HANA.

At the beginning of the year, I moved to the SAP HANA Product Management team and I am responsible for the developer persona for SAP HANA. In particular I focus on tools and techniques developers will need for the upcoming wave of transactional style applications for SAP HANA.

I come from an ABAP developer background having worked primarily on ERP; therefore my first impressions are to draw correlations back to what I understand from the ABAP development environment and to begin to analyze how development with HANA changes so many of the assumptions and approaches that ABAP developers have.

Transition Closer to the Database

My first thought after a few days working with SAP HANA is that I needed to seriously brush up on my SQL skills. Of course I have plenty of experience with SQL, but as an ABAP developer we tend to shy away from deeper aspects of SQL in favor of processing the data on the application server in ABAP. For ABAP developers reading this, when was the last time you used a sub-query or even a join in ABAP? Or even a select sum? As ABAP developers, we are taught from early on to abstract the database as much as possible and we tend to trust the processing on the application server where we have total control instead of the "black box" of the dbms. This situation has only been compounded in recent years as we have a larger number of tools in ABAP which will generate the SQL for us.

This approach has served ABAP developers well for many years. Let's take the typical situation of loading supporting details from a foreign key table. In this case we want to load all flight details from SFLIGHT and also load the carrier details from SCARR. In ABAP we could of course write an inner join:

However many ABAP developers would take an alternative approach where they perform the join in memory on the application server via internal tables:

This approach can be especially beneficial when combined with the concept of ABAP table buffering. Keep in mind that I'm comparing developer design patterns here, not the actual technical merits of my specific examples. On my system the datasets weren't actually large enough to show any statistically relevant performance different between these two approaches.

Now if we put SAP HANA into the mixture, how would the developer approach change? In HANA the developer should strive to push more of the processing into the database, but the question might be why?

Much of the focus on HANA is that it is an in-memory database. I think it's pretty easy for most any developer to see the advantage of all your data being in fast memory as opposed to relatively slow disk based storage. However if this were the only advantage, we wouldn't see a huge difference between processing in ABAP. After all ABAP has full table buffering. Ignoring the cost of updates, if we were to buffer both SFLIGHT and SCARR our ABAP table loop join would be pretty fast, but it still wouldn't be as fast as HANA.

The other key points of HANA's architecture is that in addition to being in-memory; it is also designed for columnar storage and for parallel processing. In the ABAP table loop, each record in the table has to be processed sequentially one record at a time. The current version of ABAP statements such as these just aren't designed for parallel processing. Instead ABAP leverages multiple cores/CPUs by running different user sessions in separate work processes. HANA on the other hand has the potential to parallelize blocks of data within a single request. The fact that the data is all in memory only further supports this parallelization by making access from multiple CPUs more useful since data can be "fed" to the CPUs that much faster. After all parallization isn't useful if the CPUs spend most of their cycles waiting on data to process.

The other technical aspect at play is the columnar architecture of SAP HANA. When a table is stored columnar, all data for a single column is stored together in memory. Row storage (as even ABAP internal tables are processed), places data a row at time in memory.

This means that for the join condition the CARRID column in each table can be scanned faster because of the arrangement of data. Scans over unneeded data in memory doesn't have nearly the cost of performing the same operation on disk (because of the need to wait for platter rotation) but there is a cost all the same. Storing the data columnar reduces that cost when performing operations which scan one or more columns as well as optimizing compression routines.

For these reasons, developers (and especially ABAP developers) will need to begin to re-think their applications designs. Although SAP has made statements about having SAP HANA running as the database system for the ERP, to extract the maximum benefit of HANA we will also need to push more of the processing from ABAP down into the database. This will mean ABAP developers writing more SQL and interacting more often with the underlying database. The database will no longer be a "bit bucket" to be minimized and abstracted, but instead another tool in the developers' toolset to be fully leveraged. Even the developer tools for HANA and ABAP will move closer together (but that's a topic for another day).

With that change in direction in mind, I started reading some books on SQL this week. I want to grow my SQL skills beyond what is required in the typical ABAP environment as well as refresh my memory on things that can be done in SQL but perhaps I've not touched in a number of years. Right now I'm working through the O'Reilly Learning SQL 2nd Edition by Alan Beaulieu. I've found that I can study the SQL specification of HANA all day, but recreating exercises forces me to really use and think through the SQL usage. The book I'm currently studying actually lists all of its SQL examples formatted for MySQL. One of the more interesting aspects of this exercise has been adjusting these examples to run within SAP HANA and more importantly changing some of them to be better optimized for Columnar and In-Memory. I think I'm actually learning more by tweaking examples and seeing what happens than any other aspect.

What's Next

There's actually lots of aspects of HANA exploration that I can't talk about yet. While learning the basics and mapping ABAP development aspects onto a future that includes HANA, I also get to work with functionality which is still in early stages of development. That said, I will try and share as much as I can via this blog over time. Already in the next installment I would like to focus on my next task for exploration - SQLScript.