Why Size matters, and why it really matters for SoH (Part I)
So you have decided to migrate your AnyDB based Business Suite to SAP HANA!
Not an easy choice to make by any means. This is after all a complex heterogeneous database migration.
It will require a full regression test of all of your functionality.
You will be moving to new hosts, with new host names, new IP addresses, maybe even taking the opportunity
to sort out your SID naming conventions which over the years havee “Evolved” into a bit of a mess.
So this will not just be a functional regression test as you would perform after transporting a major piece of new
functionality or maybe an SP upgrade, but a full integration test, testing all interfaces, firewalls, Non-SAP applications, etc, etc.
As to the reasons why you are moving to HANA I will not go into that now; There are plenty of other blogs out there addressing
this subject matter, but if it is purely for performance reasons and you want a magic bullet, then the reason you have a performance
problem in the first place may well make your HANA implementation far more costly than you would have thought.
There are many reasons why you system may be performing badly, but they essentially boil down to.
- Database performance
- Application server performance
- Network latency issues
- Front end performance
These are often the symptoms of either
- Application design
- Customization choices
- Crap code
- Some sort of infrastructure issue/design/scale
For the purpose of this Blog I’m only going to talk about size
AnyDB: Size doesn’t matter! (well yes it does)
How often have you heard the phrase, Disk is cheap ? I wish I had a dollar for every time I’ve heard it, I wouldn’t be here
writing a blog.
Getting a PO signed for another bunch of disks is fairly easy, solving the issue as to why your DB is growing so rapidly is not!
Often it’s not even a technical issue, most of the time it is political/lack of understanding by the business.
But why doesn’t size matter?
As I’ve already said, it’s easy to add another bunch of disks and if you do it right the available IOPS so up as well.
But often there is very little impact on the rest of the infrastructure. For your production environment you may have to increase
the memory/CPU footprint a little but often not an order of magnitude bigger.
Now if you need to copy your large Database (DR/training/QA) you will not need a significant amount of compute to go with it.
A 20 user training environment will not need a database server with the same amount of memory/CPU as the production server.
Yes, things like backup/restore/recovery/system copy time are an issue with bigger databases but they are solvable with some
hardware investment and most importantly with little or no impact of the user community.
HANA: Size really does matter! (alot)
Now lets consider your HANA database. It is an in-Memory database. The bigger your data base the bigger your Memory and disk foot print.
One thing memory is not is,Cheap!
So consider the scenario above where you have DR/QA/Training environments all of which are a copy of production.
This means each of these environments has to be the same size as production, not just from a storage point of view but from a CPU/Memory footprint
point of view
Now what if you leave your DB growth unchecked on HANA? You can’t just go and add some more cheap disk, you have to add a whole new
server (TDI model) or Appliance. Not just one, but one for DR one for QA and one for Training.
These things are not cheap.The jump from a 2 Socket 1.5TB server to a 4 Socket 3TB maybe manageable, but from 4 Socket to 8 Socket 6TB
it gets very very pricey. Beyond 8 Sockets and 6TB! I’ll let you look at the price lists.
Now lets look at your NFRs. How quickly do you want your database to come online after a failure?
Lets compare a traditional Database with HANA.
If I shutdown a traditional database and restart, it is pretty much available for use in a matter of a few minutes. Yes it will be a little slow
as the initial read of any records has to be performed from disk and loaded into memory, but my database is available.
If I shutdown HANA and restart, it has to copy the contents of the persistence layer into memory. I have seen figures of 15mins
per TB maybe even longer. So 45 mins for a 3TB database. Yes the column store load is a “Lazy Load”, i.e. it reads only columns that are actually used, but your users have been used to 1-2ms DB read times with HANA, if it suddenly takes 300+ ms to load you may get complaints. Although you can force tables to load in their entirety on start.
On a traditional DB where 300ms is the norm then a longer load time at start will not be as noticeable.
So now you probably want to start investing in a hot-standby system or super high performance flash disk.
“Scale-out” I hear you cry.
I’m talking about Suite on HANA here.
Your options for scale out are really limited. It is controlled availability only and that is not likely to change in the near future.
There are very very few people around who know how to scale-out SoH properly. It’s a complex task, that takes months to perform/test.
It requires detailed knowledge of the SAP functionality you are using and table relationships within the functionality and has to be done manually. It is not a matter of just bolting on a few more servers.
Again I’m talking about migrating your existing Suite to HANA, not S/4 HANA functionality, or SAP BW.
You have very very few options on data tiering if any, without some considerable application rework.
The chances of SAP back porting any of the data temperature functionality they are developing to SoH
is slim (reserved for S/4 BW only).
Note there are objects where the concept of data aging has been addressed, namely
- Application Log (SPS08)
- IDOCS (SPS08)
- Change Docs (SPS12)
- FI Documents in sFIN 1.0 <– Note this is for simple finance not suite on HANA
So above are just some of the reasons why you want to keep you database size down and in check.
In-Memory databases require considerably more compute/memory resources compared to traditional DBs
You can’t just throw more disk at the problem. And if you stick with your classic Landscape approach, going
to SAP HANA maybe cost prohibitive.
Next up: What does not belong in your SoH database, regardless of HANA or AnyDB and time to tell the business
they can’t have it their way anymore when it comes to data retention.
Part II can be found here Why Size Matters and why it really Matters for HANA (Part II)
- how do you know for how long not scale-out for Suite on HANA will be kept in controlled availability? In my experience CA is typically a pre-cursor to general availability (GA)
- the system startup time is typically dominated by the row store data. And that's usually not the type of data where HANA systems typically grow. So in fact it might be that a HANA with 512 GB takes the same 15 min as a 2 TB HANA system.
Besides that, recent revisions do allow for fast restart of the row store under certain conditions.
- also HANA doesn't load all data into memory upon startup (only row store and specifically marked column store data).
- saying a legacy DB would be online quasi immediately and indicating that this means business can commence as usual is misleading. The "warm up" time for caches can be considerable, especially for large installations. It's not uncommon to see 15 min go by until the target transaction rate is available again.
What's missing in this blog is that data compression in SAP HANA covers up for a lot of cases where classic DBMS would require more storage.
thanks for the comments. This is meant to be a very high level blog, aimed to get people to think before going to a hardware vendor asking for 6-24 TB servers, to fix a performance problem, which could potentially be fixed before hand using various ILM strategies which will help them on their journey to HANA. Migration time is something I did forget which is obviously highly dependent on Source database size.
Having spoken to a few people in the know, it is likely to be a while before scale-out is likely to go main stream, due to the complexities of table splitting which is going to be very customer specific. The question is will SAP invest in the technology to automate table splitting in SoH. S/4 I could see them investing as that their future.
System restart time, is obviously mainly dependent of your storage speed. The requirements for IOPS is actually very low for HANA with most white papers specifying 1200 IOPS as the requirement.
So depending on block size (between 4k and 64MB) you will get between 4MB/s and 75GB/s throughput. If restart time is of real concern I'd stick the persistence layer on SSD.
As somebody who used to have to ensure SLAs were met, time to availability was key, rather than performance, but that is another argument altogether. We were measured on downtime!
Data compression is part of the next blog.
This is an awesome article that describe real world operations and the impact of HANA sizing on those activities.
I would suggest the author to make a reference to the correct sizing tools and approaches in order to make everybody aware.
Also, I would edit some part according to what Lars pointed out.
regarding QA/test/sandbox environments with no intensive utilisation, I think sizing rules have to be more relaxed.
Since hana memory (column stores) is filled progressively and not all the data are loaded at the startup, the occurrence of having the entire "production like sized" memory full in a no productive environment in my opinion is rare. If you consider also that /SDF/HDB_SIZING calculates also a work space with the same data size, the probability is even less.
I would like to receive your comments
At least there is some relaxation regarding the storage and use of E5 processors when it comes to non prod.
Hopefully SAP will support PCIe based flash storage in the near future, i.e. flash disk addressed as memory. Once that happens the operating system can then manage what memory blocks are in use and which ones are not. Kind of like a swap file but not using disk based swap.