Skip to Content

In-Memory Computing – It’s Groundhog Day (all over again)!

In Memory computing is driving an inflection point in the IT industry quite similar to one we have seen in the past – hence the reference to one of my favorite Bill Murray movies, Groundhog Day.  I was recently speaking with a member of my team, David P. Reed, about the implications of “real real-time analytics” on the IT industry, and specifically the need for large amounts of addressable computing memory.  He shared a great story about his experiences from his time as Chief Scientist at both Software Arts (creators of VisiCalc) and Lotus (creators of 1-2-3). I thought I would share his story.

Back in the days of VisiCalc and Lotus 123, many thought the spreadsheet concept was trivial and not worth a serious person’s time.  The thinking was that there were “better” analytic tools based on “real databases”. But it turned out that the “what-if” analytics and the instant “end-user programmability of large data sets” that spreadsheets provided were actually what made the PC revolution happen.

The whole point is that VisiCalc and then Lotus 1-2-3 could not have been done without fast access to “in-memory” data.   Single-handedly, large spreadsheets were the program that drove the need for more DRAM and in effect “sold DRAM” to PC users. Spreadsheets needed to have all of the data in DRAM in order to make real-time analytics, decision-support and “what-if’ing work.

All of the major industry players at the time, Intel, Microsoft, IBM, and Apple had to react.  The need for larger spreadsheets caused Lotus, Intel and Microsoft to work together on the Lotus-Intel-Microsoft Expanded Memory Spec; extended memory boards were then sold with sufficient memory for Lotus 1-2-3 customers to build large complex models.  At the same time, Apple increased the memory capacity of the original 128K Macintosh to 512K in part to support Lotus.

The world of business changed.  Large financial companies put spreadsheet-style modeling (with its ease of programming) right into the middle of trading processes; real-time trading data was fed into spreadsheets and output trades were sent back to traders.  At Lotus, they found ways to get large volumes of archival data into PCs, becoming the first data publisher on optical drives and CD-ROMs.

Here at SAP, we are experiencing an incredibly parallel story with those early days.  In-memory computing is going to drive the industry to great changes.  Businesses are going to need systems that have far more addressable memory, holding entire analytic “what-if” databases; and more importantly they need not be just off-line computations, but an integral part of the decision-making process. For the first time, with an integrated software suite, and in-memory data, one will be able to ask simple what-if questions, and track the overall impact across an entire enterprise in a fraction of a second.

It’s Groundhog Day (all over again)!

20 Comments
You must be Logged on to comment or reply to a post.
  • … but a bit dangerous. VisiCalc and 1-2-3, were are they now? 😉

    Ike, what this SDN community has been waiting for is looking behind the nice curtains of in-memory stories and deep dive more into the very technologies that are making HANA happening.

    Warm regards.
    -Vitaliy

    • Visicalc did not make much money for employees or investors.  1-2-3, on the other hand, made a lot of money for investors and employees, and had a profound effect in the industry.

      As I said in an earlier post, the “in memory computing” trend is a very compelling one, in my opinion.  It’s based on two emerging trends:  (1) the ability to reference large amounts of memory with many processing units and not a lot of overhead, and (2) advances in data bases that decouple the requirements of relational algebra and the default storage layouts that have been used for 25 years to implement that algebra.

    • @Sygyzmundovych

      1-2-3 ideas are alive and kicking in Excel InPlace and they will be there for a while. i love Groundhog Day as it shows that with each turn things get better (faster and more relevant). i think with HANA we need a few more iterations before seeing its full potential. only a few weeks left till Feb 2, but it will keep returning after 2011, too.

      @greg_not_so

  • When I read the title, I though: “SAP HANA and Groundhog Day. Geez, someone writes a blog about yet another SAP HANA marketing announcement”. So when I read Vitaliy’s comment I had to smile. The whole blog WAS yet another marketing announcement. I wonder when the first non-Groundhog-Day announcement will arrive?
    • I’m not in marketing.  This is not a marketing message.  Building large memory machines and putting corporate data into those memories allows for very, very fast analytic based decision support.

      These are facts. Not speculation.

  • I see more differences than similarity between Lotus era and HANA era. I list two major differences:
    Lotus Era:
    1) Consumers tried to use everything(computing resources standpoint, they didn’t have much but used all) they had and ran into challenges. They asked for more. It drove IT companies to innovate.
    2) CTO(or IT Manager) had money. CTO not CFO was making IT decisions. So IT departments tried to do what was right. IT department tried to optimize the resource utilization even though it meant more spending in the short term.

    HANA era:

    1) CFO makes IT decisions. This is a big difference. CFO and everyone under him don’t want to discuss the details. Technology is too complicated for them. CFO thinks IT people can’t see the big picture. As a result, I know 100GB databases are running on servers with 64GB of memory and 1.2TB databases are running on  servers with 16GB of memory(I’m not kidding) in the same corporation. Resource utilization optimization is a low priority because IT doesn’t have money. And convincing CFO on resource utilization optimization(how many corporations have implemented ACC?) is almost an impossible task because:
       1) CFO is not interested in the details and
       2) there are hardware companies who tell CFO that buying new server with 128GB memory for 1.2TB DB would be cheaper than swapping the servers.
    2) We’re not using everything we have(such as Archiving, ACC, data modeling, ILM, Advanced DB technologies). Today IT companies appear to tell the businesses what they need.
    HANA doesn’t make sense from ‘brain'(CTO) standpoint; since most decisions are made at ‘heart'(CFO) level, HANA looks really appealing.
    Technologies such as ACC, ILM, Archiving,Advanced DB technologies would make sense from CTO standpoint. But these topics are too complex. Who has time to listen in HANA era?

    Thanks,
    Bala

  • Not a lot of information about In-Memory.  But a good story anyway.

    I agree with a lot of Bala’s points.  Except the CFO wants SAP to run faster too!  “Time is money.”  (Today seems to be my day for quotes.)

    Michelle

    • Hi Michelle,
      I agree CFO/CTO/everyone wants SAP to run faster. I guess at what cost is the question. If the performance is really a critical/top priority issue, then I don’t know why we haven’t been using what we have had for several years. We make decisions on the fly due to lack of money or time or both.
      SAP’s Adaptive Computing Controller(ACC) is in my opinion a great product to optimize the resources. I don’t hear much about it. A bird in hand is better than two in bush, isn’t it? ACC also fits SAP’s mantra: “Innovation without disruption”.

      Regards,
      Bala

    • The story was all about “in-memory”.  The market for spreadsheets and PC’s didn’t take off in the corporate environment until one could hold enough data in memory to make for instantaneous decision support.  That was the whole point of the story.
  • Sorry, this is the typical marketing blast about performance and scalability being the one and only epochal obstacle to analysis. Admittingly, it is one. But I provide you with another story:
    Imagine 10 people meet on a critical and controversial business decision, 5 of them have prepared individually for the meeting using spreadsheet analytics powered by in-memory computing as you describe. How do you reckon this meeting will go? My bet is that they will heavily argue why the others results are wrong. Likely the will defer the imminent decision until the data issue is settled. The latter translates probably into finding agreement on KPIs and other data related SEMANTICS.
    This is a typical, real-world pattern in meetings even with small scale data.
    Regards
    Tim
  • Why is in-memory computing so important to analytics when we have technologies like Solid-State Drives (SSD) that provide near-memory speed performance and will continue to drop in price over the next few years.  If my BW cubes are stored on SSD, I would guess I would have performance very similar to in-memory computing but without the complexity.  Maybe I’m missing a key component of in-memory computing?
    • Jon,

      what i gather, it’s not only about hardware, but also about the way data is stored and accessed in a program, ie. in (transposed) columns rather than rows with relational entities greatly “denormalized” (ideally to TWO tables).

      your SSD hardware question is very pertinent, though. who is the leader here, EMC in addition to IBM and HP?

      @greg_not_so

    • Jon, there is more to “in-memory” then reducing I/O time to read data.
      1/ On I/O numbers it is still 5000 CPU cycles to accesses flash memory, and 100-400 cycles to RAM.
      2/ But the issue for SAP In-memory computing engine (ICE), is that 100-400 cycles are still too long. Therefore ICE is “cache-aware”, i.e. it is developed to reduce calls even to RAM and to have as much useful data as possible available in local cache, where access can take 20 CPU cycles or less.
      3/ There are more components adding to “complexity” of in-memory computing then that, like push-down of application logic execution (pls see my earlier blog http://bit.ly/eUkzrO) or SQL operations on compressed columnar data structures as they are stored in memory already.
      PS. I took the cycle numbers from one of presentations without any verification. If they are wrong, the misleading was not intentional. But I still think the concept picture is right.
      PPS. There is still need for data persistence in ICE and SSDs are part of HANA configuration.
      • Hi,

        “push-down of application logic”? Is this already compressed or encrypted information :-)? Do you mean stored procedures and triggers?
        Citing http://en.wikipedia.org/wiki/Stored_procedure:
        “Stored procedures are used to consolidate and centralize logic that was originally implemented in applications. Extensive or complex processing that requires the execution of several SQL statements is moved into stored procedures, and all applications call the procedures”

        Some information on “SQL operations on compressed columnar data structures” from our friends at Sybase http://blogs.sybase.com/sybaseiq/2010/01/compression-v-enumeration/
        or “competitor” Vertica http://www.vertica.com/blog/19-why_verticas_compression_is_better#

        Kind regards,
        Martin

        • Martin, there are two things in your post. For application logic push down, it is not a secret that SAP applications are very low today on database workload and are processing more in application servers. I heard estimates between 5 and 20% of how much processing is happening in DB in SAP ERP; the rest is in app.server. For BW the numbers might not be that exactly, but still far too much is happening in appserver.
          To give you an example, before HANA we (SAP and HP) did some tests of BW’s DSO Activation replacing its ABAP code with a few SQL statements running on HP’s MPP database. It was flying.
          HANA 1.5 underneath BW will not it flying yet, unless many portions of the current ABAP code be pushed from appserver into ICE. And it does not need to be stored procedure (although in most cases it will) – it can be just few SQL statements, like MERGE for DSO activation.
          • Hello Vitaliy,

            thanks for your reply. Personally I prefer a solution that pushes (part of) the code into the hardware. That’s something a dreamed about for some time until I found out that one can already buy it…but that’s just because I worked with VDHL (“Dataflow language for concurrent system design/implementation”) before I joined SAP.

            Sorry but I cannot believe that 80-90% of the time in ERP is spent on the app server. That’s contradicts what I learned at SAP.

            When you compare performance of a standard BW against an accelerated BW do you take into account the influence of different hardware configurations (cpu types, number of nodes, network and memory bandwith)? But I assume you actually wanted to tell that it’s easier to optimize SQL to MPP hardware than ABAP? OK, it’s a pity but seems as if an application rewrite is a must for in memory. I also don’t know how the hardware (cpu; memory, bandwith) requirements between app and DB server differ.

            Currently HANA is always in the news, but I also want to see Sybase ASE & Netweaver in action, especially as Sybase ASE implements T-SQL stored procedure language. There’s always room for improvements 🙂

            what’s left to say: I hope I will get an “in application” database just like the combination of Excel and VertiPaq in memory DB for Powerpivot. On the other hand I also like the multi-tier hierarchy as it allows optimization for the requirements of a certain tier. yeah I guess this topic will give R&D a boost