A Brief Comparison of SAP IQ and HANA Column Store Databases
SAP have two column store databases in the market; SAP HANA and SAP Sybase IQ. At first glance these two products appear to overlap in technology and functionality. SAP is committed to both products. So what are the similarities and differences, and what are the strengths of weaknesses of each product?
Both products provide OLAP solutions which do not require the generation of stored aggregates (‘cubes’) as the IQ and HANA column store processing facilitates generating aggregated data as required.
Some key differentiators: –
· High performance replication feeds via Sybase Replication Server (multi-vendor)
· Designed for concurrent OLTP and OLAP
· Potentially removes requirement for separate OLAP solution – therefore no required for OLAP ETL/Replication feeds
· Potentially easier to implement high performance OLAP (less indexing design decisions)
HANA offers the tantalizing opportunity of running OLTP and OLAP on a single database instance – removing the requirement for separate OLAP systems and associated ETL/Replication feeds. Not only would this deliver major cost savings it would bring zero latency between the OLTP and OLAP processing. However it would seem that focus of HANA implementations will initially be ‘side by side’ solutions whereby HANA replaces the current Business Intelligence solution and OLTP processing is left unchanged. As the HANA product matures then merging OLTP and OLAP processing on a single HANA platform will become a more attractive offering. However the migration of OLTP systems to a new database platform is a non-trivial activity.
IQ continues to offer a proven high performance OLAP solution with flexible implementation options on multiple platforms. It’s supported by high performance ‘real time’ replication solution with SAP Sybase Replication Server.
Both products provide column stores – i.e. data is stored column wise – not tuple (row) wise. This provides some major advantages – which are significant for Business Intelligence / OLAP queries. These advantages are covered well elsewhere -en.wikipedia.org/wiki/Column-oriented_DBMS#Benefits. Both IQ and HANA deliver high performance OLAP solutions.
Tables can be created as column store or row store in HANA. Certain tables may benefit from being a row formatted (i.e. tables where all columns of a one row are always required). In HANA row/column format is part of the DDL used to define an object.
You can create tables stored in row format in IQ – by creating tables in the catalogue store.
Both products are ‘In Memory Databases’.
They are both in memory databases?
Both products (as I think pretty much all database products) :-
· Utilize disks to permanently store data
· Utilize memory to speed up data access
· Complete all processing in memory (where else can a computer program process data?)
So how can we call one an ‘In Memory’ product? Actually I think the use of ‘In Memory’ for any database product needs careful consideration. I think it’s fair to day the HANA is optimised for running in memory. I.e. it’s dictionary encoded, direct access compressed column store facilitates completing processing in memory. However it still logs transactions (so as to be ACID compliant) and data is written out to disk – ‘savepoints’ in HANA.
It’s worth comparing this with SAP Sybase ASE ‘In Memory’ offering where changes are not written out to disk. This offers the potential for very fast OLTP processing – without guaranteed recovery of transactions in the event of a system failure (which is suitable in some environments). This is clearly different to the ‘In Memory’ solution offered by HANA – ‘In Memory’ is a fairly flexible phrase.
Of course HANA is designed to be implemented with all current data stored in memory – i.e. the system should be sized so this is delivered. HANA loads data into ‘cache’ automatically and has commands to move objects in and out of memory. It’s also designed to utilize the fast data processing capacity of in memory data across multiple cores (2 Gb/sec per core).
Optimised for Read and Write Operations
HANA is designed to deliver high performance for read and write (data modification) statements. This would seem to be a key differentiator from IQ; IQ is firmly targeted as an OLAP solution, which will be fed by batch ETL/Replication solutions. Modification statements in column stores can require more processing then similar actions in a row store – consider how many pages must be accessed to delete a row with 50 columns in a column store. HANA has implemented a number of strategies to maximise performance of data modification statements; a Delta Store for storing data changes for later merge with the primary, main store ; updates are completed as inserts and deletes are actioned as marking rows as deleted – i.e. HANA is ‘insert only’. Timestamping rows allow statements to identify which rows are currently ‘active’ – and allows ‘time travel’ queries to be completed. (This also facilitates implementing Multi Version Concurrency Control).
Both products offer compression – column storage with ‘bit mapped indices’ (IQ) or ‘Dictionary Encoding’ (HANA). Other compression techniques are utilized on top of these. HANA make a big play of ‘Direct Access’ to data – i.e. data can be accessed without the requirement to decompress it. IQ utilizes page level LZ compression.
Optimizing Performance – Indexes
Significant differences exist HERE between the two products.
IQ offers a number of different indexing options. Defining the correct indexes on columns is central to query performance in IQ. Gaining maximum performance from IQ requires analysis to identify the correct index types required on each column.
SAP consider that HANA has little need for additional indexes on column store tables as storing data in columns is ‘functionally similar’ to having a index on each column. Scanning speed of compressed in memory column stored data allows very fast read operations.
HANA has two index types, B Tree and CPD Tree. HANA will chose the correct index type if it’s not specified within the create index statement. Dictionaries are indexed.
In IQ only a single session can write to a particular table at any given time – though multiple concurrent reads can be completed whilst the update statement is in progress (‘Table Level Lock”). This is implemented by using ‘Snapshot Versioning”. Blocked processes to not wait to complete – a blocked process is aborted. Note that IQ utilizes ANSI isolation level 3 and this cannot be changed.
HANA utilizes multi-version concurrency control (MVCC) to implement consistent read operations. Concurrent read operations see a consistent view of the database data without blocking write operations. Updates are implemented by inserting new versions of data.
Parallelism / Scaling Out
Both IQ and HANA support multiple core hosts and multiple hosts for a single instance. They also support single queries utilizing multiple cores on one host and single queries engaging multiple hosts. However how each product escalates to multiple hosts within a query is very different.
HANA is designed to use multiple cores accessing memory resident data to complete queries. Administrators can partition tables – and this should be completed to spread large tables across multiple hosts. However HANA can dynamically assign chunks of tables to different threads – for example when undertaking aggregation calculations so that queries can utilize the maximum number of cores.
Cached objects are not shared between hosts – tables/partitions are resident on a specific host. The majority of processing of data takes host on the host it’s resident on. HANA provides for partitioning so that tables can be distributed across the HANA landscape – facilitating all hosts being engaged in a query.
IQ also supports engaging multiple threads for a single query and supports a single query spanning multiple nodes via the IQ ‘PlexQ’ architecture. A key difference between IQ and HANA is that IQ utilizes a ‘shared everything’ model via a shared SAN. When Sybase IQ determines that a query might require more CPU resources than are available on one node, it will attempt to break the query into parallel “fragments” that can be executed on other servers within IQ ‘grid’. HANA queries are executed on the host where the data resides. The IQ approach is potentially easier to administer as no consideration is required for data locality and the design and administration of this (although data locality is transparent to HANA users). However a ‘shared everything’ approach does introduce the overhead of data synchronisation between nodes.
Data warehouse systems required feeds from OLTP systems to deliver transactional data from the OLTP databases. Data replication is commonly used technique to complete this.
IQ Supports replication from Sybase Replication Server. This allows feeding data in from Sybase ASE, Oracle, MS SQL Server & DB2.
Recent versions of Replication server have implemented high speed replication into IQ with ‘Real Time Loading’ for IQ.
SAP plan to deliver a version of Replication Server that works with HANA in 2013. Currently SAP provides a replication solution for HANA – SAP Landscape Transformation – a trigger based replication solution.
IQ is a mature product, which continues to be actively developed and enhanced by SAP.
HANA on the other hand is new to market and is undergoing fairly rapid change. In fact significant portions of the documented functionality have yet to be implemented.
My first experience using HANA indicated how new a product HANA is. A simple aggregation was slower on a column store table than an identical row store table. This was resolved when the optimiser hint ‘OLAP_PARALLEL_AGGREGATION’ was submitted with the query. A known bug – and simple to work round (supply the hint to all queries) – but an indication of how new this product is.
So which one is faster? I not been able to run any valid performance tests (given that I only have access to a limited cloud solution for HANA) and I’ve not seen any comparative performance figures produced. Of course it’s probable that each product has its own ‘sweet spot’ where it delivers comparatively higher performance than the other product.