Why NOT to Use an In-Memory Database

grassl · ‎01-31-2017

In-memory databases are becoming mainstream. All the major database vendors and many startups today offer in-memory database solutions that provide an order-of-magnitude performance increase over disk-based RDBMS.

Volatile memory is available for $6-7 per GB (as of November 2016), and the latest 8-socket servers can support up to 24TB of RAM, meaning that in-memory technology is within reach for most organizations and applications.

So why not to use an in-memory database?

If you ask many industry experts, even in late 2016 in-memory databases are not for everyone. Here are three common use cases which are widely thought not to be suited for in-memory:

Technical Scenario	Example	Believed Not Suited for In-Memory Because...
1. Small scale	Small business CRM system	Can run on low cost server with acceptable performance
2. Very large scale (data volume)	Log analysis on large enterprise systems	Memory resources too expensive
3. Non-mission-critical systems	Community forum in a corporate website	“Okay” performance is acceptable

Should We Just Run it on Disk?

In all these cases, the simple answer is yes. But if you think strategically about your data, and take into account a few little-known benefits of in-memory technologies, you might find that even in these cases, in-memory technology is in fact applicable. It can even change the game.

In the remainder of this article, we’ll take you on an intellectual journey to re-imagine some of the obvious reasons to “just run it on disk”.

Click on the use case that interests you and find out if it really makes sense to run it on disk:

► Small Scale

► Very Large Scale (Data Volume)

► Non-Mission-Critical Systems

Small Scale → Just Run it on Disk?

Traditional Thinking about In-Memory Technology

“ For the vast majority of back-office applications at small-to-medium businesses (CRM, email, ecommerce, invoicing), a single modern server with a standard disk-based RDBMS will provide adequate performance. Moving to in-memory would require upgrading to a higher-end server with more memory, and introduces complexities in data management, without significant benefits.”

Question #1:

Do you (or will you) have a complex data model?

Complex data models can cause very high computational load as the system evolves – even in small scale systems. In-memory systems can help because they are strongly optimized for complex processing of data.

► Out of Box Thinking #1

If you have a complex data model – large number of tables, complex relations between tables, stored procedures, etc. – your system might be doing a lot of crunching even for simple queries. Data size and complexity is not static: Imagine a company that sells 100 products on an eCommerce site, and then due to a business decision, starts selling 1000 products. Or instead of storing 10 features per product, they start storing 1000 features. You can’t really predict what will happen to your data or how fast that data will grow.

If you’re facing data complexity, at some point, even a low throughput system will outgrow its single machine with disk-based RDBMS. Performance will degrade, and you’ll have to partition or shard your database, or consider moving to NoSQL or distributed architectures. None of these options are easy, and you’ll have to figure them out knee-deep into a performance problem. If you’re building a green field application, you should consider building it from day one to withstand complexity in the future.

► How In-Memory Technology Helps:

In any database system (whether disk-based or memory-based), after the data is loaded from the database, any calculations performed on the data occur in-memory. Pure in-memory systems store the data in-memory to begin with, so it is faster to retrieve the data for processing. But this alone doesn’t help us with complexity. Let’s focus on a different point: how fast can you process the data while it is in-memory?

Some databases provide performance optimization for processing data in-memory, which can make complex data crunching much faster. For example, SAP HANA, our in-memory database, optimizes complex query performance using a columnar data model, with “late materialization” of data: it keeps the data in compressed form and performs as many joins, views, selections, filters and aggregates, before loading the data into memory, reducing the amount of processing that needs to be done. In addition, data is stored in cache-aware structures, with highly efficient use of modern CPUs, including fully multi-threaded processing which takes advantage of multiple CPU cores. Due to these optimizations, SAP commonly sees performance increases of up to 10,000x for complex queries.

To take another example, Microsoft SQL server, which today offers in-memory functionality, provides a feature called In-Memory OLTP, which improves performance of transaction processing, data ingestion and data load, and is optimized for stored procedures and transient data scenarios. Microsoft says this improves performance substantially compared to regular Transact-SQL processing in SQL Server.

Our main point – in-memory databases not only allow you to retrieve data faster, they also help you process data faster because they use memory-aware performance optimizations and a multi-threaded architecture. If you are facing data complexity, there is very high value in optimizing the calculations you perform on the data.

Building a new small scale system on top of a database with strong optimization for complex data and queries, will pre-empt the problem of data complexity. Because optimized in-memory processing is several orders of magnitude faster, even as data complexity increases, performance will hardly be affected at all. Meaning your system will be much more able to handle complex data and analytical requirements.

Question #2:

Do you (or will you) run traditional analytics or reporting on your data?

Traditional analytics and reporting will slow your OLTP system down, and eventually require a separate OLAP system. With an in-memory databases you can add analytics, reporting and even a full OLAP system on the same database server with no reduction in performance.

► Out of Box Thinking #2:

In OLTP / transactional systems, analytics or reporting of some kind is almost always needed. But it’s usually implemented as an afterthought: First we build the transactional system, and later figure out how to generate the reports or metrics users might need.

If you rely on traditional approaches, analytics will quickly become a computationally-intensive task that requires you to add more hardware resources, and add more code to your application that extracts the insights. It will be harder to develop and harder to maintain.

At some point you might create a separate OLAP system to handle analytics, and then you need to maintain two systems, OLAP and OLTP, manage the transfer of data between them, and settle for “non-fresh” data on the OLAP system because it won’t be possible to load data from production and perform analysis in real time. Real time analytics is becoming a requirement for many common use cases.

Is there a way to get around this complexity and build systems from the ground up for analytics and reporting?

► How In-Memory Technology Helps:

It can’t, because analytics doesn’t happen in the database, right?

Well, it can, if you want it to. The common architectural decision is to implement analytics and reporting in the application tier – the application pulls data from the database, processes it (sometimes with the help of external libraries or systems), and generates the outputs users need. But there is another option.

You can run analytics and reporting tasks within the database as stored procedures – moving the processing load from the application tier to the database tier. In a traditional disk-based database, you wouldn’t do this, because the database was slow. But what if the database was 1000x faster?

The decision to do the analytics as part of the application was an unconscious decision, stemming from the limitations of database technology. If you use an in-memory database, it makes sense to move those computations to the database tier, because an in-memory database can take the load.

By performing analytics and reporting directly in an in-memory database, you can achieve a few things:

Massive performance improvement compared to disk-based systems, especially in combination with in-memory processing optimization (see Out of Box Thinking #1 above).

Real-time analytics, because there is no need to retrieve the data or a subset of it into your analytical system

Simplified programming model, because at in-memory speeds, you can perform operations directly on the full data, without working through aggregations and materialized views

No need for a separate OLAP system – this is the big one: by running analytics and reporting within your transactional database, you eliminate the need for ETL and separate OLAP systems. This is a pattern known as Hybrid Transactional/Analytical Processing, which is gradually becoming mainstream.

Question #3:

Do you (or will you) run advanced analytics – machine learning, predictive, geospatial, graph, etc.?

Advanced data manipulations such as supervised machine learning will tax your system resources and increase complexity in the application layer. Most in-memory databases can perform advanced analytics in-database, faster and with less application code.

► Out of Box Thinking #3:

Today there is mainstream use of machine learning, predictive analytics, and analysis of special types of data such as geospatial, time-series data, text analysis, graph data, etc. These types of advanced analytics are driven by complex algorithms and require heavyweight processing. They also typically require integrating third-party systems (like R) which help you crunch the numbers.

Advanced analytics, whether you have it now or plan to add it in the future, will be a big computational burden on your application tier. You’ll need to add hardware resources, write more code that “drives” the different systems to extract the insights, and suffer the high latency of having to pull the data from the database and “feed” it to those systems.

► How In-Memory Technology Helps:

Many modern databases allow you to perform advanced analytics inside the database, using built-in stored procedure libraries or integrations between the database and external systems (instead of having to integrate your application with those systems). See how this is provided by Microsoft SQL Server, Oracle and SAP HANA.

If you combine in-database analytics with in-memory storage you can achieve the following:

Massive performance improvement compared to disk-based systems, especially in combination with in-memory processing optimization (see Out of Box Thinking #1 above).

Real-time analytics, because there is no need to retrieve the data or a subset of it for processing by your application tier or external systems.

Less code, because the implementation of the advanced analytics operations, for example the execution of a supervised machine learning algorithm, happens on the database and does not need to be coded in your application tier. In some cases you might need only one line of code in your application to run a complex data manipulation.

Less integrations, software licenses and open source dependencies. If your database is able to meet all your advanced analytics needs, you’ll need less software systems to integrate with, manage, and pay for. This can significantly reduce TCO for your data driven application.

Very Large Scale (Data Volume) → Just Run it On Disk?

Traditional Thinking about In-Memory Technology

“ Memory is expensive and there’s only so much memory a server can hold. My system is __ Terabytes in size so I can’t afford to go in memory. And even if I can, what happens when my data grows?”

Question #1:

What is your compression ratio?

► Out of Box Thinking #3:

Let’s say you have 4 TB of data. That doesn’t necessarily mean that if you go in-memory, you’ll need 4 TB of memory. It’s possible to compress the data and take up much less physical memory. So the compression ratio of the data is a significant factor in understanding how “big” your data really is.

► How In-Memory Technology Helps:

Most in-memory databases perform compression on your data before storing it in-memory. SAP HANA and Oracle 12i support in-memory compression (while it is not supported in Microsoft SQL 2016). According to SAP’s HANA Sizing Guide, the average compression ratio for HANA is 1:7 (excluding indexes), probably the highest in the industry. This means that 7 TB of data could require only 1 TB of RAM.

Consult with your database vendor what is likely to be the compression ratio for your specific application and data, and compare that ratio across different vendors. When considering an in-memory system, use the compressed data size and not your total size.

Question #2:

What is the real cost of ownership of disk-based vs. in-memory systems?

► Out of Box Thinking #1:

It’s true that memory is scarce on servers and more expensive than regular or flash disks. But the DRAM hardware itself is just a small fraction of the total cost of a server, taking into account the rest of the hardware, maintenance, IT management, etc. Taking a holistic look at your enterprise systems, if you were to run them in-memory, how would it affect your total cost of ownership?

► How In-Memory Technology Helps:

A Forrester study showed that running large enterprise systems in-memory vs. on disk results can actually reduce hardware costs by 15% (disclaimer – the study focused on the SAP HANA database running an SAP ERP application). Even with the need for more DRAM memory and high-end hardware, because in-memory technology is far more efficient, in the end it can require less server resources to run the same workloads. In the same study, software costs were reduced by 70% and administration/development costs by 20%.

This will not be the case for all in-memory implementations. The level of saving will depend on an organization’s ability to make maximum use of in-memory technology, by pushing data processing from the application layer to the database layer (as we discuss in detail above).

Question #3:

Is all your data of high value?

► Out of Box Thinking #2:

It could be that a lot of your data is of “low value” – for example, archived data saved only for regulation or legal purposes. Low value data is not likely to be queried or used very often, so there is really no need to store it in-memory. If 80% of the data is low value and only 20% is likely to be seriously used, your storage requirements are very different and in-memory could be much less expensive than you think.

► How In-Memory Technology Helps:

Modern in-memory technologies are well aware of the tension around memory resources, and enable a smooth transition between disk-based and in-memory storage. The leading disk-based database solutions, including Microsoft SQL Server and Oracle, provide In-Memory Tables that let you selectively move data into memory; SAP HANA provides Multi-Store Tables which span both memory and disk.

So it’s not really necessary to pull all data into memory, only those parts that matter for day-to-day operations and high performance. This can also reduce the number of RAM that is actually needed in an in-memory system.

Non-Mission-Critical Systems → Just Run it On Disk?

Traditional Thinking about In-Memory Technology

“ My system is not mission critical. As long as performance is “okay” I can live with it, users do not complain and there is no reason to invest in high-end technology like in-memory.”

Question #1:

What would be your TCO if you ran in-memory?

► Out of Box Thinking #1:

Some non-mission-critical systems can consume significant server resources. Or there might be numerous non-mission-critical systems that, when combined, represent a significant cost. A common example is development servers such as the open source Jenkins – a common phenomenon in large organizations is “Jenkins sprawl”, with hundreds or thousands of continuous integration servers run independently by different dev teams. Collectively, all those servers take up massive system resources. What would happen if all those Jenkins instances were consolidated into one platform with very high performance?

► How In-Memory Technology Helps:

In-memory systems can improve performance by an order of magnitude compared to disk-based systems. Organizations that run data-intensive applications – even if they are not mission critical or don’t have a need for speed - will find that migrating them to in-memory can reduce server resources and cost of ownership

A Forrester study (cited above, focusing on the SAP HANA database), showed that running large enterprise systems in-memory, compared to on disk, can result in up to 37% lower TCO, and 15% lower hardware costs. Even with the high-end hardware needed to run in-memory, the overall cost saving can be substantial.

Summary: Think Again About Your Reasons Not to Use In-Memory

Technical Scenario	Believed Not Suited for In-Memory Because...	When Could In-Memory Make a Difference?	Benefits of an In-Memory Database
1. Small scale (throughput)	Can run on low cost server with acceptable performance (or consumed as a service)	► Complex data ► Analytics / reporting ► Machine learning and special data types	► Performs well when data complexity increases ► Analytics and reporting do not hurt performance ► Easier implementation of analytics/reporting ► Real time analytics
2. Very large scale (data volume)	Memory resources too expensive	► High cost of ownership in compute, maintenance, IT

► Ability to compress in-memory data

► Some data with low value

► Can actually reduce total cost for hardware compared to disk-based systems

3. Non-mission-critical systems“Okay” performance is acceptable

► The non-critical system consumes significant resources

► Runs the same non-critical systems on less resources with lower operating costs

In-Memory for Everyone

Traditional thinking around in-memory technology says that in-memory is for high end, high performance applications only. In this post we tried to shatter that myth and show that in fact, in-memory is for everyone.

SAP practices this philosophy day to day – since 2013 all our enterprise applications have been running in-memory, a bit later SAP cloud infrastructure followed and is fully based on our in-memory database, SAP HANA. Our SAP HANA customers, including largest enterprises in the world to one-person small businesses, are running in-memory for transactional and analytical process to run their business processes and complex analytical data processing.

If you’ve read this far, you’ll understand why this makes sense: everyone should run in-memory, because:

In-memory is future proof – even as data grows in complexity, or as reporting and analytics requirements arise, in-memory systems can provide the same high performance without re-architecture or refactoring.

In-memory can provide “all in one” – OLTP database, OLAP, reporting, machine learning and more. In-memory databases with optimization for in-memory data processing can do much more than a regular database. They can eliminate the need for separate OLAP, analytics and reporting systems and perform these activities with much higher performance.

In-memory costs less – although the memory chips themselves cost more per GB than a hard disk does, the total cost of in-memory systems is much lower: they can run bigger workloads on less hardware, and require less software licenses, less integrations and less application code to develop and maintain.

Give In-Memory a Try with the Free SAP HANA Express

We recently launched SAP HANA, Express Edition – a lightweight version of our HANA database which is free to use up to 32 GB of data.

HANA was built around the three axes we defined above – helping make applications future proof, providing an “all in one” solution for data processing and reducing total cost. Trying HANA Express with your data set will help you quickly evaluate the impact that in-memory technology can have on your business.