Skip to Content
Author's profile photo Former Member

In-Memory – is it really non-disruptive?

First off – I am thrilled beyond words. Hasso and Vishal totally got rock star status in my books with this announcement. As I tweeted earlier, it is also a bit of a case of bragging rights for me – since it closely matched my predictions in So, what is next in Business Intelligence ?  which I wrote a year ago 🙂

 

While this is excellent news – I don’t think this totally resonates with the “no disruption” theme that SAP leadership was touting all three days at SAPPHIRE NOW. This could totally be attributed to my lack of understanding of the whole idea. So as always – jump in and comment on your thoughts.

 

Here is the general idea I got from the keynotes. The New DB will get a snapshot of Old DB, and then get delta images as it happens. Apparently this takes only few hours for initial load and few seconds for deltas. New DB will have a columnar database – which to my mind is something like the account model in BI, where every column can serve as an index. I assume SAP has some cool software that will do a metadata mapping from ABAP DDIC to the new model, and then somehow keep it in sync. I am not sure what happens to persistence after it moves to new DB. I guess we can quickly reload everything from old DB in a disaster recovery scenario to a new appliance. 

 

Here are my questions.

 

1. For the delta to happen from old DB to new DB, some program has to read the data in old DB. Wouldn’t this mean that all constraints of the old DB – like its data model, lesser quality hardware etc apply while sending the deltas to new DB? So, although the receiving system is super fast, wouldn’t this delta process take something more than a few seconds?

 

2. Would the application continue to use ABAP DDIC ? I am especially keen to understand if there is any changes to record locking in this new paradigm.

 

3. Are blade servers a mandatory thing for in-memory analytics? Blades are not cheap, and this is one reason many customers hold back on BWA and BOBJ Explorer. With the new Sybase acquisition – can SAP also do columnar DB with non-blade boxes?

 

4. I am also curious to find out how SAP thinks of harnessing the power of front end machines. Phones don’t have a terribly powerful processor now, but that is not to say that won’t change. So when your front end can do a lot – and most devices already have a lot of memory, what is SAP doing to harness that power? Wouldn’t it be a shame to waste it by only doing server side optimization for analytics?

 

5. When at some point in future, the new DB becomes the only DB for SAP – will SAP revise the ABAP DDIC to optimize it to the new world? My point is – the existing modelling theories were all built in a time where storage was costly and processors were slower. So when processors and memory become very cheap- for example, can we have applications all have flat structures for all transaction data like we do for reporting? And use normalized data only for master data maybe?

 

6. For analytics – we typically need to combine data from multiple sources, including the virtual way that Data Federator provides. Now if one of the sources is columnar and others are row based – is it easy to combine this data? I remember the troubles of combining key figure based cubes with account model based cubes in Plan vs Actual reporting in our existing world. How does SAP solve this?

 

I have some more questions, but let me first get an idea of the ones above. Besides my flight lands in 20 minutes and I am almost out of battery  charge on my PC !

Assigned Tags

      25 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Sergio Ferrari
      Sergio Ferrari
      Vijay, I like this blog and I agree with your considerations.

      Sergio

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Thanks Sergio
      Author's profile photo Bala Prabahar
      Bala Prabahar
      Vijay,

      I didn't get a chance to listen to the announcement. However, I thought I will make an attempt to answer(?) your questions with a little knowledge I have on this topic: (my answers are based on BWA knowledge)

      1) Answer is YES. BWA is an example. In BWA, rollup (delta) would take sometime, not hours but minutes based on data model, aggregation level, data size etc.

      2) I believe columnar database is good for storing aggregated data. I don't believe this is good for storing "transactional" data. All R/3 ABAP programs are written for conventional relational databases(normalized databases). I don't believe columnar databases would be in a normalized state (the idea is performance, not storage efficiency).

      3) Blade servers vs non-blade servers: I don't believe blade server requirement is driven by technical reasons.

      4) This depends on business requirements. Would you like to process data everytime by every user on desktop just because desktops have processing power or process data only once on server and store it in database?

      5) I don't believe new DB (Columnar DB) would ever become only DB. Normalization is needed not just to save space but also avoid inconsistencies in data. I would avoid data redundency to minimize(or avoid) inconsistencies. I wouldn't store vendor information or item information in purchase order table.

      6) In my opinion, all transactional databases would remain "non-columnar".

      I would love to hear other thoughts.

      Regards,
      Bala

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Bala, here is my (mostly uneducated) positions to your replies 🙂

      1. Ok - cool, so we agree

      2. Not sure if I understood this right - I thought a major advantage of columnar data bases is that there is not need to aggregate at database level. With 2TB of RAM and 64 to 128 cores, why wouldn't you just aggregate on the fly?

      3. So then why blade servers?

      4. With phones - the limitations will be on bandwidth. Why send data back and forth on networks, when you can send it once to a powerful phone and let it do all crunching? same for graphics - why not just use phone's processing power for letting user choose how to display data?

      5. I did say use normalized structure for master data and flattened for transactional data. So with this clarification - what do you think
      ?

      6. Somehow I think most transactional DB's will move into columnar over time. But to save investment in existing code - maybe have an abstraction layer like today's DDIC to serve the applications.

      Author's profile photo Bala Prabahar
      Bala Prabahar
      Vijay,

      2. Columnar databases, by definition, store aggregated data.(Ex: BWA stores data in columnar DB). What do I mean by this?

      Let us consider this example:

      Table A stores transactional data (R/3) and contains:
      Customer ID, Item # ,Item Amount

      Data:  1 1  $25   (1st record)
             2 10 $50    2nd record
             1 1  $75    3rd record
             3 4  $10    4th record etc

      Table B stores data in cube in BW(Just customer id and total amount):

             1 $100
             2 $50
             3 $10

      Let us say customer ids 1 and 2 are located in great state of Georgia.

      If you are looking for information about GA customers, BW system without BWA will read two records 1 and 2.

      However in a columnar database, one record will store all records for GA customers. Data in columnar database would look like this:

          GA 1 $100 2 $50
          CA 3 $10

      So if you are looking for GA customers, we would read only one record in a columnar db.

      Does this make sense?

      3) Marketing. Hardware vendors wanted to introduce "utility type" method(plug-ins) of scalability. Buy more hardware only when you need it. But with cloud computing, virtualization etc, I don't know if this concept is still attractive.

      4) This appears to be architecture/data modeling issue.

      5) I will give example:

         Current design:
         Purchase order contains Order #, Item # ,Vendor # ,Qty, Purchase order amount

      Yes, I saw your comment regarding normalized master data. Now are you suggesting that we store all item details, vendor details etc in every purchase order? or I don't understand what you mean by "flattened for transactional data"?

      6) Based on my example (2), do you still think "most transactional DB's will move into columnar over time"?

      Regards,
      Bala

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Bala, I guess our understanding of columnar databases is fundamantally different.

      Here is my understanding - a columnar database holds all data of a column together - so the serialization in your example will look like
      GA,GA,IL,2,1,3,10,1,4,$50,$75,$10 and in a row base databse will look like GA,2, 10, $50, GA,1, 1, $75, GA, 3, 4, $10. Due to this, each column in a query can be read without reading through all the rows as we usually do. This will also mean that there is no real need for a pre-aggregation like cubes, BW Aggregates etc. Loading should be able to make use of one thread per column - which should be really fast. However, I do have questions on how efficient it really is to uncompress this data to read. Columnar database still needs an OLAP engine to do analytics - but eerything is pre-tuned. You will not need indexes and so on for various queries since each column serves as its own index.

      Ok - so does this match your understanding?

      Author's profile photo Bala Prabahar
      Bala Prabahar
      Vijay,

      No, our understanding is different. I still believe my definition is correct. You may want to read this from wiki (even though this is not authentic, my thoughts/ideas are in line with this definition)
      http://en.wikipedia.org/wiki/Column-oriented_DBMS.

      If your understanding is correct, then one has to read two records to read values from two columns, 3 records for 3 columns and so on, correct? How to join them?

      Thanks,
      Bala

      Author's profile photo Bala Prabahar
      Bala Prabahar
      Vijay,

      I believe wiki's definition is more in line with your definition.

      However, wiki and others suggest columnar databases are good for OLAP (not for OLTP) as I indicated.

      Thanks,
      Bala

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Bala, thanks for clarifying quickly.

      So, my point is - why even have a separate OLTP and OLAP system, if you can use the columnar DB as your backend? If I understand the theory here correctly - there is no need that I can see. An OLAP engine on top of the columnar db should serve both OLTP and OLAP.

      Author's profile photo Former Member
      Former Member
      Vijay, your point on having no need for separate OLTP and OLAP systems is in line with Hasso's keynote for long term strategy. At first, duplication of the data and ETL & data warehousing is used more as a safety net or steps in the overall surgical procedure.  These steps serve current customers going along with the "no disruption" theme. 

      In the future, the need for a separate data warehouse or OLAP is no longer needed.  Your analytical tools such as BO Explorer, Xcelsius, etc, would plug right into your ECC system(s).

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Thanks for chiming in, Ed.

      Since you are a technical wizard - let me pose this question to you. I am fully sold on the write efficiency and compression when you put data on a columnar DB. But is it really read efficient? Will de-compressing this into an application readable form for large quantities of data affect its scalability?

      I would assume that some how this columnar data should be represented in a relational form for applications to use it. And in my unsophisticated mind - I think there are limitations in this approach. I really wish I could talk to some one from SAP who can explain this in detail.

      Weird as it might sound - I also have the reverse question. Columnar DB for large DB size probably makes sense to invest in. But what about the vast majority of SAP customers who don't have to talk in TB when dealing with data? Is there value in investing in columnar db for them?

      And finally - why should in-memory just be on blades? I could not find out if sy-base IMDB needs blades - but I think it will be truly revolutionary if SAP can somehow make it work without blades.

      Author's profile photo Bala Prabahar
      Bala Prabahar
      Vijay,

      I will try to answer your question indirectly. Recently I happened to read a white paper on Oracle's Advanced Compression - 11g. In 11g(SAP started supporting this a month ago), they have introduced new clause to CREATE TABLE... statement. The new clause is "COMPRESS FOR OLTP". Initially I was bit shocked to see "FOR OLTP" COMPRESS feature. However it made sense after reading the entire white paper.

      I also believe they must have invested a lot of time and efforts to develop the algorithm to support "Compress for OLTP" feature. Theoritically I understand what they mean by "Compress for OLTP". I am currently testing this feature. The fact that they had to come up with an algorithm for OLTP for compression tells me that OLTP and OLAP systems can't share one DB in foreseeable future.

      In addition, I worked on first data warehouse project in 1995 in Informix. Informix recommended two different configurations then, one for OLTP and another for OLAP. Even today, DB vendors recommend two different configurations.

      OLTP systems use Binary-Tree indexes whereas in order for Star join to work, Oracle expects bitmap indexes in data warehouses.

      The differences between OLTP and OLAP are too many so I don't know if customers would be excited about moving to columnar db(due to cost )  even if it is technically possible in foreseeable future.

      Also columnar db is not new; BWA has been using this for 3-4 years.

      Thanks,
      Bala

      Author's profile photo Former Member
      Former Member
      Vijay,

      Thanks for an informative blog.

      I'm not sure about not needing an OLAP system because of columnar DB. OLAP systems typically work with data consolidated from more than one source. Having the capacity to natively analyse data of transactional systems is nice, but what do you do when data comes from multiple sources? These source could be in in-house systems as well as business partners. You need to store that data somewhere for the purpose of analysis, at least until the day when data storage becomes truely virtual and you can analyse data without caring where it resides.

      Thoughts???

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      That was my 6th bullet in the blog. I have that question too 🙂

      However, with tools like BOBJ data federator, it should be possible in theory to do virtual reporting today.

      Author's profile photo Former Member
      Former Member
      Oops, my bad 🙂

      I heard this argument by Hasso about not needing a separate OLAP engine after columnar DBs, and wasn't really sure if he missed this scenario of data warehousing requirements.

      Data Federator should work fine for systems inside company firewalls. Not sure I don't think partners would allow to federate data from their systems though...So we'll need separate data warehouses to store data into and report on.

      Author's profile photo Former Member
      Former Member
      Vijay, your understanding is correct. You guys may want to check out Dan McWeeney's excellent explanation from this Enterprise Geeks video that we shot at last year's Sapphire event after the Hasso keynote.
      Author's profile photo Former Member
      Former Member
      Blog Post Author
      I thought I had checked out every podcast you guys ever did, Ed. But I was wrong - let me go watch this one. Thanks for pointing this out.
      Author's profile photo Former Member
      Former Member
      2. correct - on the fly is the new name of the game

      3. don't you want to help your biggest customer(s)?

      4. mobiles will be the interface of choice for all reporting

      5. denormilize, denormalize, denormolize until u end up with 2 tables

      6. no savings from existing code - everything has to be rewritten.

      Author's profile photo Darren Hague
      Darren Hague
      My understanding from watching the keynote is this: because all database access is via the ABAP SQL abstraction layer, this layer can write to NewDB at the same time as it writes to the old DB. That means that NewDB is always completely up to date - and in fact the old DB is acting simply as a backup. Throw in the fact that NewDB also persists its data to solid-state disk, then once you've got NewDB up and running, there is no need to have the old DB around any more.

      Does that make sense?

      Cheers,
      Darren

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      That should work in theory - but I have a question as follow up. new Db should be able to write at tremendous speed since each column can be handled by a thread by itself. But old DB will need time to write row by row. So can locking be managed in a way that these two are kept in synch?

      Also, is the idea that new Db has an interface to a persistent storage? I would think that it would be a waste. Why not just do a quick full time load back into new DB when there is a DR scenario?

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Author's profile photo Sethu M
      Sethu M
      Hi Vijay,

      My name is Sethu and head the prod mgmt/strategy for Data and Analytics at SAP; My team is responsible for the prod management of the In-Memory technology program (strategy and product).

      I would like to take this opportunity to thank you for your passionate blog and the questions from there. I have answered them below and please feel free to contact me if you want to discuss this further.

      1. For the delta to happen from old DB to new DB, some program has to read the data in old DB. Wouldn't this mean that all constraints of the old DB - like its data model, lesser quality hardware etc apply while sending the deltas to new DB? So, although the receiving system is super fast, wouldn't this delta process take something more than a few seconds?

      There are several ways this side-by-side can be kept up-to-date. We are looking at several options with Replication as the primary technology. Replication that chases the log and replicate data from the log as it happens. Most of the wall-street uses this technology from Sybase and others where the volume of transactions is at least order of magnitude more than any ERP transactions. The performance of this technology is normally kept up-to-date in seconds (and definitely not in minutes).

      2. Would the application continue to use ABAP DDIC ? I am especially keen to understand if there is any changes to record locking in this new paradigm.

      The application will continue to use ABAP DDIC. Existing application will continue to use ABAP DDIC. New/small applications will be able to choose the application server environment. Definitely the DDIC will be an input for those applications, but not necessarily their own repository.

      3. Are blade servers a mandatory thing for in-memory analytics? Blades are not cheap, and this is one reason many customers hold back on BWA and BOBJ Explorer. With the new Sybase acquisition - can SAP also do columnar DB with non-blade boxes?

      The current plan is to partner with IBM/HP and others in the future with an appliance which uses blade servers. Nevertheless the technology is not bound to blades, we are keeping all options for further delivery models open.

      4. I am also curious to find out how SAP thinks of harnessing the power of front end machines. Phones don't have a terribly powerful processor now, but that is not to say that won't change. So when your front end can do a lot - and most devices already have a lot of memory, what is SAP doing to harness that power? Wouldn't it be a shame to waste it by only doing server side optimization for analytics?

      Very interesting thought. In the analytical world, there is aggregation, simulation and what if analysis (all of which need a global data access and this can be done only on the server side) and visualization. When systems are mobile enabled, one of the major scenarios that mobile devices will be used is in Visualization. Visual rendering is a high intensive computational process. In the future we will consider running a mobile version of the high performance analytical engine.

      5. When at some point in future, the new DB becomes the only DB for SAP - will SAP revise the ABAP DDIC to optimize it to the new world? My point is - the existing modeling theories were all built in a time where storage was costly and processors were slower. So when processors and memory become very cheap- for example, can we have applications all have flat structures for all transaction data like we do for reporting? And use normalized data only for master data maybe?

      Once again – very thoughtful. We need to definitely consider de-normalized data models with vertical and horizontal semantic partitioning. This will help in doing complex analytics with high volume and trickle feed data (this is done today in the world of web traffic analytics). Also for extensibility reasons we are carefully looking into new ways of model the data. ( i.e. key value pair storage layer )

      6. For analytics - we typically need to combine data from multiple sources, including the virtual way that Data Federator provides. Now if one of the sources is columnar and others are row based - is it easy to combine this data? I remember the troubles of combining key figure based cubes with account model based cubes in Plan vs Actual reporting in our existing world. How does SAP solve this?

      The optimizer is intelligent and smart to create dynamic joins such that data in one format is reformatted to other. The optimizer does take into account the reformatting cost and other access and selectivity costs into account. Of course, in addition the option of replicating the data to the columnar database is always there.

      Best Regards,
      Sethu

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Thanks Sethu - I really appreciate you taking time to answer. I have just one follow up question on the topic of key value pairs.

      Are you planning to implement this as a layer that is not visible to the application? Since it is hard to enforce referential integrity unless we explicitly program for it in the application, I am curious how this will get implemented in an app like ERP where every thing refers everything else. The effort to rewrite all the apps to explicitly check for integrity is too large to be feasible in my mind. So I am assuming you are thinking of somehow making this behind the scenes without the application needing to change.  Could you explain your thoughts on this matter?

      Author's profile photo Sethu M
      Sethu M
      Correct. The underlying data model should be transparent to the application and the goal is not to force any application change.
      Author's profile photo Sam Venkat
      Sam Venkat
      Hi,
      Besides ECC what other products can we expect to adopt the In-Memory technology?  In other words what is the roadmap for NewDB?