In-memory databases are becoming mainstream. All the major database vendors and many startups today offer in-memory database solutions that provide an order-of-magnitude performance increase over disk-based RDBMS.
Volatile memory is available for $6-7 per GB (as of November 2016), and the latest 8-socket servers can support up to 24TB of RAM, meaning that in-memory technology is within reach for most organizations and applications.
So why not to use an in-memory database?
If you ask many industry experts, even in late 2016 in-memory databases are not for everyone. Here are three common use cases which are widely thought not to be suited for in-memory:
Technical Scenario | Example | Believed Not Suited for In-Memory Because... |
1. Small scale | Small business CRM system | Can run on low cost server with acceptable performance |
2. Very large scale (data volume) | Log analysis on large enterprise systems | Memory resources too expensive |
3. Non-mission-critical systems | Community forum in a corporate website | “Okay” performance is acceptable |
In all these cases, the simple answer is yes. But if you think strategically about your data, and take into account a few little-known benefits of in-memory technologies, you might find that even in these cases, in-memory technology is in fact applicable. It can even change the game.
In the remainder of this article, we’ll take you on an intellectual journey to re-imagine some of the obvious reasons to “just run it on disk”.
Click on the use case that interests you and find out if it really makes sense to run it on disk:
“ For the vast majority of back-office applications at small-to-medium businesses (CRM, email, ecommerce, invoicing), a single modern server with a standard disk-based RDBMS will provide adequate performance. Moving to in-memory would require upgrading to a higher-end server with more memory, and introduces complexities in data management, without significant benefits.”
Complex data models can cause very high computational load as the system evolves – even in small scale systems. In-memory systems can help because they are strongly optimized for complex processing of data.
If you have a complex data model – large number of tables, complex relations between tables, stored procedures, etc. – your system might be doing a lot of crunching even for simple queries. Data size and complexity is not static: Imagine a company that sells 100 products on an eCommerce site, and then due to a business decision, starts selling 1000 products. Or instead of storing 10 features per product, they start storing 1000 features. You can’t really predict what will happen to your data or how fast that data will grow.
If you’re facing data complexity, at some point, even a low throughput system will outgrow its single machine with disk-based RDBMS. Performance will degrade, and you’ll have to partition or shard your database, or consider moving to NoSQL or distributed architectures. None of these options are easy, and you’ll have to figure them out knee-deep into a performance problem. If you’re building a green field application, you should consider building it from day one to withstand complexity in the future.
In any database system (whether disk-based or memory-based), after the data is loaded from the database, any calculations performed on the data occur in-memory. Pure in-memory systems store the data in-memory to begin with, so it is faster to retrieve the data for processing. But this alone doesn’t help us with complexity. Let’s focus on a different point: how fast can you process the data while it is in-memory?
Some databases provide performance optimization for processing data in-memory, which can make complex data crunching much faster. For example, SAP HANA, our in-memory database, optimizes complex query performance using a columnar data model, with “late materialization” of data: it keeps the data in compressed form and performs as many joins, views, selections, filters and aggregates, before loading the data into memory, reducing the amount of processing that needs to be done. In addition, data is stored in cache-aware structures, with highly efficient use of modern CPUs, including fully multi-threaded processing which takes advantage of multiple CPU cores. Due to these optimizations, SAP commonly sees performance increases of up to 10,000x for complex queries.
To take another example, Microsoft SQL server, which today offers in-memory functionality, provides a feature called In-Memory OLTP, which improves performance of transaction processing, data ingestion and data load, and is optimized for stored procedures and transient data scenarios. Microsoft says this improves performance substantially compared to regular Transact-SQL processing in SQL Server.
Our main point – in-memory databases not only allow you to retrieve data faster, they also help you process data faster because they use memory-aware performance optimizations and a multi-threaded architecture. If you are facing data complexity, there is very high value in optimizing the calculations you perform on the data.
Building a new small scale system on top of a database with strong optimization for complex data and queries, will pre-empt the problem of data complexity. Because optimized in-memory processing is several orders of magnitude faster, even as data complexity increases, performance will hardly be affected at all. Meaning your system will be much more able to handle complex data and analytical requirements.
Traditional analytics and reporting will slow your OLTP system down, and eventually require a separate OLAP system. With an in-memory databases you can add analytics, reporting and even a full OLAP system on the same database server with no reduction in performance.
In OLTP / transactional systems, analytics or reporting of some kind is almost always needed. But it’s usually implemented as an afterthought: First we build the transactional system, and later figure out how to generate the reports or metrics users might need.
If you rely on traditional approaches, analytics will quickly become a computationally-intensive task that requires you to add more hardware resources, and add more code to your application that extracts the insights. It will be harder to develop and harder to maintain.
At some point you might create a separate OLAP system to handle analytics, and then you need to maintain two systems, OLAP and OLTP, manage the transfer of data between them, and settle for “non-fresh” data on the OLAP system because it won’t be possible to load data from production and perform analysis in real time. Real time analytics is becoming a requirement for many common use cases.
Is there a way to get around this complexity and build systems from the ground up for analytics and reporting?
It can’t, because analytics doesn’t happen in the database, right?
Well, it can, if you want it to. The common architectural decision is to implement analytics and reporting in the application tier – the application pulls data from the database, processes it (sometimes with the help of external libraries or systems), and generates the outputs users need. But there is another option.
You can run analytics and reporting tasks within the database as stored procedures – moving the processing load from the application tier to the database tier. In a traditional disk-based database, you wouldn’t do this, because the database was slow. But what if the database was 1000x faster?
The decision to do the analytics as part of the application was an unconscious decision, stemming from the limitations of database technology. If you use an in-memory database, it makes sense to move those computations to the database tier, because an in-memory database can take the load.
Advanced data manipulations such as supervised machine learning will tax your system resources and increase complexity in the application layer. Most in-memory databases can perform advanced analytics in-database, faster and with less application code.
Today there is mainstream use of machine learning, predictive analytics, and analysis of special types of data such as geospatial, time-series data, text analysis, graph data, etc. These types of advanced analytics are driven by complex algorithms and require heavyweight processing. They also typically require integrating third-party systems (like R) which help you crunch the numbers.
Advanced analytics, whether you have it now or plan to add it in the future, will be a big computational burden on your application tier. You’ll need to add hardware resources, write more code that “drives” the different systems to extract the insights, and suffer the high latency of having to pull the data from the database and “feed” it to those systems.
Many modern databases allow you to perform advanced analytics inside the database, using built-in stored procedure libraries or integrations between the database and external systems (instead of having to integrate your application with those systems). See how this is provided by Microsoft SQL Server, Oracle and SAP HANA.
“ Memory is expensive and there’s only so much memory a server can hold. My system is __ Terabytes in size so I can’t afford to go in memory. And even if I can, what happens when my data grows?”
Let’s say you have 4 TB of data. That doesn’t necessarily mean that if you go in-memory, you’ll need 4 TB of memory. It’s possible to compress the data and take up much less physical memory. So the compression ratio of the data is a significant factor in understanding how “big” your data really is.
Most in-memory databases perform compression on your data before storing it in-memory. SAP HANA and Oracle 12i support in-memory compression (while it is not supported in Microsoft SQL 2016). According to SAP’s HANA Sizing Guide, the average compression ratio for HANA is 1:7 (excluding indexes), probably the highest in the industry. This means that 7 TB of data could require only 1 TB of RAM.
Consult with your database vendor what is likely to be the compression ratio for your specific application and data, and compare that ratio across different vendors. When considering an in-memory system, use the compressed data size and not your total size.
It’s true that memory is scarce on servers and more expensive than regular or flash disks. But the DRAM hardware itself is just a small fraction of the total cost of a server, taking into account the rest of the hardware, maintenance, IT management, etc. Taking a holistic look at your enterprise systems, if you were to run them in-memory, how would it affect your total cost of ownership?
A Forrester study showed that running large enterprise systems in-memory vs. on disk results can actually reduce hardware costs by 15% (disclaimer – the study focused on the SAP HANA database running an SAP ERP application). Even with the need for more DRAM memory and high-end hardware, because in-memory technology is far more efficient, in the end it can require less server resources to run the same workloads. In the same study, software costs were reduced by 70% and administration/development costs by 20%.
This will not be the case for all in-memory implementations. The level of saving will depend on an organization’s ability to make maximum use of in-memory technology, by pushing data processing from the application layer to the database layer (as we discuss in detail above).
It could be that a lot of your data is of “low value” – for example, archived data saved only for regulation or legal purposes. Low value data is not likely to be queried or used very often, so there is really no need to store it in-memory. If 80% of the data is low value and only 20% is likely to be seriously used, your storage requirements are very different and in-memory could be much less expensive than you think.
Modern in-memory technologies are well aware of the tension around memory resources, and enable a smooth transition between disk-based and in-memory storage. The leading disk-based database solutions, including Microsoft SQL Server and Oracle, provide In-Memory Tables that let you selectively move data into memory; SAP HANA provides Multi-Store Tables which span both memory and disk.
So it’s not really necessary to pull all data into memory, only those parts that matter for day-to-day operations and high performance. This can also reduce the number of RAM that is actually needed in an in-memory system.
“ My system is not mission critical. As long as performance is “okay” I can live with it, users do not complain and there is no reason to invest in high-end technology like in-memory.”
Some non-mission-critical systems can consume significant server resources. Or there might be numerous non-mission-critical systems that, when combined, represent a significant cost. A common example is development servers such as the open source Jenkins – a common phenomenon in large organizations is “Jenkins sprawl”, with hundreds or thousands of continuous integration servers run independently by different dev teams. Collectively, all those servers take up massive system resources. What would happen if all those Jenkins instances were consolidated into one platform with very high performance?
In-memory systems can improve performance by an order of magnitude compared to disk-based systems. Organizations that run data-intensive applications – even if they are not mission critical or don’t have a need for speed - will find that migrating them to in-memory can reduce server resources and cost of ownership
A Forrester study (cited above, focusing on the SAP HANA database), showed that running large enterprise systems in-memory, compared to on disk, can result in up to 37% lower TCO, and 15% lower hardware costs. Even with the high-end hardware needed to run in-memory, the overall cost saving can be substantial.
Technical Scenario | Believed Not Suited for In-Memory Because... | When Could In-Memory Make a Difference? | Benefits of an In-Memory Database |
1. Small scale (throughput) | Can run on low cost server with acceptable performance (or consumed as a service) | ► Complex data ► Analytics / reporting ► Machine learning and special data types | ► Performs well when data complexity increases ► Analytics and reporting do not hurt performance ► Easier implementation of analytics/reporting ► Real time analytics |
2. Very large scale (data volume) | Memory resources too expensive | ► High cost of ownership in compute, maintenance, IT |
Traditional thinking around in-memory technology says that in-memory is for high end, high performance applications only. In this post we tried to shatter that myth and show that in fact, in-memory is for everyone.
SAP practices this philosophy day to day – since 2013 all our enterprise applications have been running in-memory, a bit later SAP cloud infrastructure followed and is fully based on our in-memory database, SAP HANA. Our SAP HANA customers, including largest enterprises in the world to one-person small businesses, are running in-memory for transactional and analytical process to run their business processes and complex analytical data processing.
We recently launched SAP HANA, Express Edition – a lightweight version of our HANA database which is free to use up to 32 GB of data.
HANA was built around the three axes we defined above – helping make applications future proof, providing an “all in one” solution for data processing and reducing total cost. Trying HANA Express with your data set will help you quickly evaluate the impact that in-memory technology can have on your business.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
13 | |
11 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
5 |