The ACID House :  What is an ‘In Memory Database’?

In Memory Computing and in particularly In Memory Databases are currently buzzwords du jour amongst the IT vendor community.

But what do they mean?  A quick search yields the following definition: –

In-memory computing is the storage of information in the main random access memory (RAM) of dedicated servers rather than in complicated relational databases operating on comparatively slow disk drives. In-memory computing helps business customers, including retailers, banks and utilities, to quickly detect patterns, analyze massive data volumes on the fly, and perform their operations quickly. The drop in memory prices in the present market is a major factor contributing to the increasing popularity of in-memory computing technology. This has made in-memory computing economical among a wide variety of applications (http://www.techopedia.com/definition/28539/in-memory-computing)

Sounds new and different right?  No more creaky spinning platters  – everything happening in memory – this sounds like a major technological change.

However, at least from the perspective of database systems, things are not so straightforward.

Looking at the leading Relational Database Management Systems, we can see that that they are designed to complete operations in memory – wherever possible – minimising writing  and reading  from permanent storage (‘disk’).  The more memory you give one of these systems the less they write and read and disk. Reads are cached for later use, writes are minimised – with only minimal information being written to disk to meet the ACID requirements of a transaction.

So do the new batch on ‘In Memory Databases’ work differently from this? Well yes and no. For example SAP HANA has some very interesting storage structures which facilitate very fast processing on huge data sets; but this would seem to be more to with the column store structures and query parallel execution rather than being ‘In memory’.  It’s still an ACID compliant system which writes out transactions to ‘disk’ albeit SSD – a technology which can be used with any database system.  Of course HANA loads all data into memory at start-up; but database systems (and database administrators) have been ‘warming the cache’ for decades. 

So what happens if we become a little less ACID?  This opens up opportunities for significant performance gains for OLTP systems.  Many database systems offer options here. For example SAP ASE allows transaction confirmation without the write going to disk (the Durable of ACID), ‘delayed commit’.   Sounds like a bad idea?  Depends. Of course in the event of a server crash you risk lost transactions.  Not really suitable for a bank; potentially very suitable for an online retailer who needs performance to mitigate losing customers and can risk loosing a handful of customer transactions in the rare event of a system failure.  

It’s also worth looking at the mix of transactions you have on a system, using full durability only where the business requires it.   This can generate major performance benefits in busy OLTP systems.

Moving on from ‘a bit less ACID’ we can consider placing the entire database in memory without ever writing out to disk  – that’s transaction and data pages.    SAP offers this with Sybase ASE In Memory databases.  Major performance gains can be delivered, without having to change a line of code.

In summary, database systems love memory. They always have done.  The more memory, the less reading from disk.   Optimised storage structures (i.e. column stores with ‘direct access compression’ – HANA) can allow far more data to reside in the same memory space, allow queries across bigger data sets at in memory speeds. Reducing writing to disks requires more consideration.  Moving away from full ACID transactions by utilizing delayed commits and relaxed latency databases can yield real performance gains for OLTP systems. 

Finally – replacing your spinning platters with SSD will benefit any database system.  Is that ‘In Memory’?

To report this post you need to login first.

27 Comments

You must be Logged on to comment or reply to a post.

  1. Flavio Furlan

    Hi Chris,

    I’m sure if I understood your point here, but the latency of memory is a decimal part compared with disk access. I agree that SAP HANA gains is more about column organization, compression and paralellism, but running everything in memory multiply that advantages.

    Another point, SAP HANA have a very robust recovery system to avoid data loss when we have power shutdown. SAP HANA is totally in complaince with ACID principle. There is no “delayed commit” here.

    I remember when I program in Clipper using PC 386, compiling and linking was a pain. In that age, I use a program called RAM Disk that create a driver in-memory. Then I execute the compiling and linking in memory and all process was super fast comparing to disk operation.

    In-memory is faster them disk is obvious, but none company have kudos to execute entire DB in memory and garantee 100% recovery. And SAP did.

    Take care!

    Furlan

    (0) 
    1. Chris Jones

      I was not suggesting HANA has a delayed commit.  I was just looking at what is an In Memory database.  To many people I talk to it suggests that you will lose data if the plug is pulled.  Not the case with HANA;  can be the case with other database products  – which actually might be closer to what people think an in memory database is – i.e. not Durable.

      (0) 
  2. John Appleby

    I’m not too sure what this blog is about. Guess you are an ASE guy, and ASE is a great database. A few comments though to explain a few things about HANA.


    HANA performs one order of magnitude better than ASE with its cache pinned in memory, for range selects. This is due to the combination of columnar storage and dictionary encoding. In practice this means a huge increase in the amount of throughput on the same hardware.

    Please don’t confuse HANA’s structures with being optimized for in-memory access with a cache – it’s the combination of in-memory and optimized structures which brings the huge benefits. For example, Sybase IQ is a columnar database, but it suffers from sucky individual insert performance. HANA fixes this because it uses optimized in-memory objects that allow MVCC for fast individual inserts, and a partition-level delta store.

    HANA is designed to be ACID, it’s designed for use with Enterprise OLTP/OLAP scenarios. If you give up some durability, you can get benefits with any database, but that’s not what HANA is about. Transactions are written off to disk in a log, this only causes a performance impact during bulk loads, which aren’t part of the HANA lexicon, because data happens in real-time.

    Also do note, that most HANA systems don’t use much SSD. HANA uses disk only for durability/backup, and that doesn’t require high performance disks, just sequential writes. Usually only the 500GB log volume is SSD, and some vendors just use disk for all of it.

    Hope this clarifies a few points for you.

    (0) 
    1. Chris Jones

      I’m an ASE,Oracle SQL Server and HANA technologist.

      I agree that HANA is desgined fron the ground up to deal with the update issue – with MVCC, Delta Store etc.  IQ has moved in this direction (it too now has a delta store)  but I’ve yet to test this out.

      HANA is a great product. It’s performance can be 100’s of times more than other  systems.  Its desgin allows for concurrent OLTP and Analytics.

      I wrote this blog to start a discussion about what is an in memory database is.

      You do not have to look to hard to find confusion about this  Have a look at this article :-

      http://www.computing.co.uk/ctg/feature/2371368/whats-holding-back-in-memory-databases#

      “While in-memory databases offer a significant leap in performance, they also have one big disadvantage that hasn’t yet been satisfactorily overcome, argues Gartner vice president and distinguished analyst Donald Feinberg, and which is hampering more widespread adoption.

      “Here’s the problem: if an in-memory database goes down you have all kinds of issues with recovery because memory is volatile – you lose everything. So applications have to be aware of that when they are doing ‘commit’; when they are doing ‘transaction control’. They have to understand how the in-memory database is working so that they can assure the consistency of a transaction,” says Feinberg. “And that’s just one example.”

      As we know this does not apply to HANA.  So is it ‘In Memory’ ? It reads and writes from disks.  In pretty much the same way as other RDMS systems.  I.e. Write ahead log/checkpoint process etc. etc.

      Finally I know of one vendor of SSD storage who is seeing interest in  SSD storage for all HANA data.  It may only have to read from the (MAXDB?) data structure at startup. However if this takes 10 minutes to do then  you have an issue.    Also heard of IQ/HANA side by side implementations with all hot data on HANA and hot and cold data on IQ.  In the event of of a HANA fail/restart IQ takes over as HANA has that trickly problem of pulling all that data off disk 20 minutes of tapping your fingers …….

      (0) 
      1. Tom Cenens

        Hi Chris

        In-memory is just a bad marketing term so it doesn’t make sense to discuss it much further in my opinion.It’s being thrown around like other bad marketing terms, cloud anyone?

        The technical blueprint (before SAP HANA was born) was named “sanssoucciDB” which suits it better.

        Best regards

        Tom

        (0) 
        1. Chris Jones

          Spot on Tom.   In Memory in the Cloud anyone?

          I once  ask an SAP salesman to go off and try and do something on a computer that was not in memory.   He looked confused.

          (0) 
        2. John Appleby

          Actually Tom, SanssouciDB was a research project by HPI.

          HANA was originally called NewDB, which I happen to quite like 🙂

          You are right, the IMDB moniker is confusing. HANA combines a lot of old and new concepts to make a new database design which is plain “better” IMHO.

          (0) 
          1. Gregory Misiorek

            i like those early days, when true to its name, Sanssouci was just a project. i have a feeling that soon enough i may be dragged into something resembling Grosssouci, which i once experienced when visiting Sanssouci’s Voltaire’s room.

            (0) 
            1. John Appleby

              Ah Greg, unfortunately this is Enterprise Software and it’s usually Avecsouc or Grossouci 🙂

              As a point of interest, SanssouciDB was named after the summer palace of Frederik the Great. It was designed to reflect a harmony between man and nature. It’s right near the University of Potsdam.

              In an interesting twist of fate, King of Prussia, just two miles away from SAP America HQ, is also named after Frederik the Great.

              On a personal note, I think it would have been more fitting had Hasso named it NeuesPalaisDB, but it doesn’t really have the same ring.

              (0) 
            1. Lars Breddemann

              Actually I strongly prefer something like HANA.

              In itself it doesn’t bear any meaning, but meanwhile is associated with this data processing platform we created.

              Similar to ORACLE or DB2 it does not put nominal restrictions on what it actually is.

              There’s no inherent association based on the word that would say “this is a SQL database that uses this and this technology”.

              To me it’s a nice short and easy to remember word that serves well as a placeholder for what we do with HANA.

              I recall many customers that talk in ways like “we store this data in oracle” or developers who say “this is fetched from HANA”. None of them is referring to the platform product but to the very instance of this platform used in their context. For them the label HANA works fine, too.

              Product naming is a difficult thing (and boy do we at SAP know that) but in this case I really think they’ve done it well.

              (0) 
              1. John Appleby

                You may be right. I’m definitely not trying to criticize the marketers, because I don’t know any better 🙂

                I think though that IMDB is a problem, because it means different things to different people. To many, it implies that durability isn’t present, because some IMDBs like Hazelcast have limited durability.

                (0) 
                1. Jelena Perfiljeva

                  “In memory” certainly is confusing pretty much to everyone who has not spent time studying this concept. When I signed up for the openHPI course a while back and was telling my husband about this new in-memory DB thing his first reaction was: “so, if it shuts down all the data is lost?”. You guys might chuckle at this but it is exactly what regular people (and he’s in IT, by the way) think first hand. When we mention IMDB to them immediately there is a lot of explaining to do on “where does the poop go”. Doesn’t seem like a good thing.

                  By the way, in Russian HANA (хана) is a slang word meaning “the end”, “breakage” or even “death” depending on the context. I’ve been enjoying HANA jokes on the Russian SAP forum for years already and I’m sure there are more to come, so – don’t change it. 🙂

                  (0) 
                  1. Gregory Misiorek

                    in that case i vote for Japanese word for ‘flower’ and Moravian river. once i check out the town in Hawai’i i will let everyone now if it deserves the name unless somebody has already done it.

                    (0) 
      2. John Appleby

        Certainly, the in-memory moniker is confused. SAP wanted to differentiate their database as being different, when they created HANA. Certainly, HANA is different.

        I’m less certain whether IMDB, In Memory Computing, or all the other acronyms that the marketing folks made up were the right ones. Question is whether there is a better term that could have been used?

        (0) 
  3. Jay Thoden van Velzen

    Hi Chris,

    Interesting blog. You make some interesting points, but I do feel a bit that the focus on the technology alone may lead to missing the forest for the trees.

    Innovation is rarely so “innovative” that everything is new. It is often the combination of different things that make something new and compelling (i.e. the combination of everything in-memory with the disk mostly for recovery/start-up loading with the column store, and a number of other things in HANA, but you can also think of the iPad/tablet).

    Where in-memory really shines, though, is what it enables. Being able to do things on-demand, rather than in batch. This may seem minor, but can really change the way you work and structure your business processes.This becomes even more interesting in complex analytic processes or predictive models, which can be run in-memory as well. This enables things like on-demand recommendations based on what a user does on a website, or complex models that used to be run once a week and now can be run on demand. The key, then, is the business value that an in-memory database enables, not so much whether it is really something entirely new technologically.

    I would strongly disagree, though, that a less-ACID database for OLTP systems is ever acceptable. Really? Which online retailer would be OK with losing a number of transactions? The website needs to be responsive, yes, but not at the cost of losing transactions, let alone any referential integrity.

    (0) 
    1. Chris Jones

      Non ACID website transactions can be a very very good idea.

      Lets say your a online bookseller.   Customers are likely to go elsewhere if your site is slow.  So you consider a non ACID approach to improve performance. Lets say you choose delayed commit.  You see a considerable  performane gain – customers are happy and use your site.

      Now lets say a database server crashes.  You might  lose  a few seconds of transctions on part of your estate ( your a big player –  you have a sharded system).

      This will be a very very rare event.

      You’ve lost transactions   – customer not charged/products not shipped.

      A score of unhappy customers maybe once a year

      Against a much bigger group of happy customers because you have a fast website

      (0) 
      1. Jay Thoden van Velzen

        Ideally, your database wouldn’t be slow in the first place, and your inventory would have to be pretty large (as well as your user base) for this to even be a problem.

        I guess it could be a consideration, but I’d hate to have to explain to a customer that to speed up their website, they’d have to take the risk to lose a number of transactions. We’d probably explore all other options first, no?

        (0) 
        1. Chris Jones

          I’m not out on a limb here chaps?  Have a read of this..

          http://www.dataversity.net/acid-vs-base-the-shifting-ph-of-database-transaction-processing/

          “The new pH of database transaction processing has allowed for more efficient vertical scaling at cost effective levels; checking the consistency of every single transaction at every moment of every change adds gargantuan costs to a system that has literally trillions of transactions occurring. The computing requirements are even more astronomical. Eventual consistency gave organizations such as Yahoo! and Google and Twitter and Amazon, plus thousands (if not millions) more the ability to interact with customers across the globe, continuously, with the necessary availability and partition tolerance, while keeping their costs down, their systems up, and their customers happy. Of course they would all like to have complete consistency all the time, but as Dan Pritchett discusses in his article “BASE: An Acid Alternative,” there has to be tradeoffs, and eventual consistency allowed for the effective development of systems that could deal with the exponential increase of data due to social networking, cloud computing and other Big Data projects”

          (0) 
          1. Ramana Krothapalli

            Thanks for point us to the blog “Base: An Acid Alternative”. For me, it is very educational. However, the way I understood the article seems to slightly different. Right now, ACID compliance is achieved together by the database and application in one logical instance. This blog indicates how to get eventual consistency when the data and processing is distributed across multiple instances for achieving scalability. However, it still needs  and relies on each individual database to be ACID compliant. Take the example in the blog and see what will happen if one of the databases says that it had committed a change but it just lost it in thin air. There cannot be an eventual consistency. It will be forever inconsistent.

            As your already know, SAP applications use update queues of type V1, V2 & V3 in some what similar fashion. The end user thinks that the transaction is saved but in reality it is in the queue. If the database says that it has saved the transaction information into queue consistently across all the queue related tables, we are fine. However, if one table of the queue is updated and not the other table of the queue, then the queue itself is inconsistent and we may never be able to commit the transaction eventually.

            Ramana

            (0) 

Leave a Reply