Why Size Matters and why it really Matters for HANA (Part II)
In the first part of this blog I described some reasons why you want to try and keep your database small.
Apart from cost there are also some compelling technical reasons why you will eventually come to a hard stop
in terms of database growth. I.e. the limit of technology today.
The biggest x86 systems on the market today (and certified) are currently 16 socket 12 TB nodes, with one vendor offering
a potential 32 socket 24TB node.
With future x86 advancements (Broadwell and beyond) SAP may release higher socket/Memory ratios (i.e. use of 64GB Dimms),
but for the time being we are limited to
- 2XSocket 1.5 TB
- 4XSocket 3 TB
- 8XSocket 6 TB
- 16XSocket 12 TB
Take a look at what you store in your Database
When you look at your existing Business Suite database, what are your high growth areas?
- Do you store attachments in your DB, i.e. PDFs/jpgs/Word docs/Engineering drawings?
- Do you use workflow heavily?
- Do you rely on application logs?
- Do you keep your processed IDOCs in the DB?
- Do you generate temporary data and store it?
- Do you keep all the technical logs/data that SAP produces on a constant basis?
The above does not even look at business relevant data or retention policies.
You will be surprised at how much data is stored in your DB that your users never use or very infrequently.
Does this data really belong in your Business critical application that should be running at peak performance all of the time?
Probably not but there are valid (and invalid) reasons why it is stored in there.
Lets take attachments as an example.
Think back when your SAP system was first implemented. There were probably budget and time constraints, made all the worse
because of project over runs.
That great design your solution/technical architect came up with, using an external content/document server, that required a separate
disk array, server and Document server license, was likely have been torn up as SAP provided a local table just for purpose of storing attachments.
The architect lost the argument and the data was stored locally (Yes I have been there).
This scenario actually has two consequences.
a) Resulting in a large database (I have seen the relevant tables grow to above 2TB)
b) Slow performance for the end user, as you have to access the database, load the relevant object into DB memory, then into the application Server Memory
before it is shown to the user.
With a remote document store, the user is passed a url pointing directly to the relevant object in the document store, bypassing the application servers and DB server at the same time reducing the load on the server network.
One a workflow is completed, does it really need to sit in the users inbox? Yes in some industries, e.g. aerospace I can imagine you need to keep a record of all workflows, but do they need to be stored online? Would it not be more secure to store them in a write once/read many archive where the records cannot be edited?
Again I have seen workflow tables so large (and over engineered) that the SAP Archive process can’t even keep up with the rate of creation.
Again how long do you need to keep these? Is there a compliance reason, again what is the maximum amount of time these logs are actually relevant.
IDOCS and other transient objects.
Once an IDOC is successfully processed the data will already have been loaded into the relevant tables. After than the data loaded into the IDOC tables in most cases is pretty much irrelevant, if you have to keep the IDOC data, store the incoming/outgoing IDOC/XML file. Large IDOC tables can cause significant performance issues for interfaces.
Is there any other temporary data you create that is truly transient in nature? Consider if you really need to keep it.
SAP can create various logs in the database at an alarming rate. I’ve seen DBTABLOG (log of all customizing changes) at 1TB, SE16N_CD_DATA (a log of data deleted via SE16) at 100GB (what are you doing deleting data via SE16 anyway?!?!?!)
Business Data Retention Periods
This is the hardest nut to crack. As stated in Part I, disk is cheap. Getting the business to agree on retention periods was nigh on impossible and a battle the poor suffering OPS guys/gals would retreat from.
With In-Memory databases this is a battle line that will need to be redrawn. As stated in the introduction, there are technical limits as to how far your database can grow without suffering severe performance degradation or costs will increase an order of magnitude more than they did with disk based technologies.
Hard questions have to be asked.
- Why do you have to keep the data online?
- At what point does your data become inactive?
- Once inactive will you need to change it?
- Is the reason for Legal/Compliance reasons or just because somebody said they want all data online?
- If this inactive data is only going to be used for analysis, would it not be better storing it elsewhere in a summarized form? (this is one of the reasons why BW will not die for a while)
One area where users complain about Archiving, is that they have to use a different transaction to get at archived data. You may have a counter argument now.
With the journey to SAP HANA you may well be considering Fiori. A complete change in User Interface, so the user has to re-train anyway, so it becomes a moot point.
I realize I have not talked much about HANA in this part. Old hats like me would have heard the above again and again in regards to traditional databases. We have often lost the argument or maybe even just thrown disk at the problem rather than getting into the argument in the first place.
With In-Memory databases, a jump from one particular CPU/Memory configuration to another can be a doubling in price, rather than a linear increase with disk based databases.
If your In Memory database is so big that it reaches the limits of current technologies, you may be in big trouble. An emergency archiving project is always nasty. It will be political. Your system can crawl as you frantically use all available resources to offload data, and the end-users will complain about new transactions they have to use as the change will be forced upon them.