Skip to Content
Technical Articles
Author's profile photo Ozan Uzun

SAP HANA Daily Operation best practices

I am a former Infrastructure Consultant and I worked in operations, outsourcing and even in support.

I would like to share my best practices on SAP HANA Operation and Maintenance. I will focus on infrastructure setup and high availability topics.

 

Keep in mind that those ideas are my best practices and may not fit to your landscape.

All comments are for production systems, but  might help with non production systems,too.

 

Infrastructure:

 

I will focus on Tailored Data Integration  in virtual environments, because appliances are easier. They  are already certified by SAP. On the other hand they lack the flexibility of the TDI.

Another important topic is that TDI installation should done by vendor consultants (TDI) or certified experts.

Some info for TDI users;

On TDI setups, performance responsibility is on customer/partner. There is a SAP  tool, called hardware checktool,which should be run  after  each  HANA installation.

 

Please keep your hardware checktool output which has performance and landscape control KPI’s.

Remember, you can and you should run hardware checktool after each configuration change.

 

Running HANA DB on virtual environments:

 

Customers are moving to virtual deployments, because they are easier to manage and cost effective.

It is a good practice to run HANA DB’s on virtual environments, but there are strict rules to for that. Otherwise there can be a negative performance impact.

Admin teams on each landscape have their own habits and best practices, which sometimes do not fit on HANA best practices.

I will give insights about Vmware and PowerVM setups.

 

Let’s focus possible issues on Vmware Installations.

 

 –  High CPU overcommit ratios

  – Huge datastores

  – Not caring about numa affinity

 

Production HANA installation rules are strict. (2020)  – Please check sap note 2393917 –

 

Four is the maximum production HANA servers which can run on a 2 socket system. You can only create 0.5, 1, 2 socket VMs. No odd vm’s like 1.5  ( or 2.5 for 4 sockets)

– Three production HANA DB plus  any nonproduction or non-hana workloads is an option. You cannot share half of a socket with a non-hana workload.

Cut each cpu  into half, that is the smallest portion that you can attach to a HANA DB, including attached memory. You can not do incremental changes to resources.

  • If you pass the borders of 1/4 RAM of Total (1/2 of cpu socket ram),  the next step is 1/4 more, you should not add partial memory to the Virtual machine.

Reason: Each cpu socket has 12 dram channels(x86_64), which has dedicated bandwidth, If you create a VM with 1/2 cores on a socket, memory bandwidth  is guaranteed for that VM.

If you  keep core/vcpu count and add more memory to a Virtual Machine, that memory will be attached over other cpu cores/sockets. That is called “far memory” and has higher latency, which has a performance impact. I call so VM’s, sad VM’s.

 

Sounds confusing but let me give an example.

A two socket server has 56 cores, and 3 TB of memory. It has 24 DRAM’s 256GB each. Each socket addresses, means connected to 12 DRAM modules. You should create 1 HANA DB with 750 GB/14cores, 1.5TB/28cores or , or 3TB 56cores. You should not create a server with  8 cores and 2TB, because  those 8 cores cannot address all that memory locally.

 

This is  my first diagram ever, hope it will help you understand.

VM1 (yellow) has 2 cores and 768GB of local memory. That is the target.

VM2 (green)  has 2 cores but 1024GB of memory which  is not available locally. Some portion of the memory will be accessed remotely which has a higher latency.

Core5-Core6 and other memory resources are allocated by other virtual machines.

 

 

CPU overcommit is another issue, that can have a huge impact on  performance. ESX is a very efficient hypervisor but it has also technical limits. CPU overcommitting means assigning more vCPU’s to the Virtual machines than available.  I see 2X values and that is just fine.  Hypervisor will just switch vCPU’s from one real core to another, effectively  share the cpu time. This operation is called content switching and it  is visible from linux terminal. It has a small, neglectable performance impact, caused by cpu cache miss. Cores have caches, which is far faster than conventional memory.

 

Allocating half socket virtual machine helps hypervisor to pinpoint cpu memory connection and  system will better cache hit  rate.

 

A real life example, a benchmark that I did personally, a  cpu pinned 16 vcpu Vmware ESX guest will get 15-20% better SpecJVM result than a regular guest. Pinned result is almost identical to bare metal. Pinned cpu assignment means, those cores are only usable for that specific guest, running 1:1.

 

Current CPU’s are very strong and administrators want to use them effectively and they like to position HANA DB Virtual Machines like any other workload.

 

 

It gets interesting when there is a resource battle between virtual machines. Again let me give an example.

We have again 56 cores, 112 vCPU server, and we assigned 250 vCPU’s to virtual machines. On an easy day there will be no performance issue.

At the end of the month, there are heavy calculations on each virtual server on that host. Each VM asks hypervisor for cpu time, but there is  not enough available. Admin overcommitted each VM with 8-16 vcpu’s, because this is easier to manage.

On each cpu cycle only 112 vCPU’s can work (because there is only 112vcpu’s) and 250-112=128 vCPU’s wait. If some vcpu’s  were idle, that would be no problem, but if they also ask for resources,  they must wait.

 

You can also monitor this behavior on %steal percentage on top/mpstat command output. That is the cpu cycle percentage which is “stolen” from the guest/virtual machine by hypervisor.

There are many ways to overcome this issue, but sticking SAP notes would be my choice

 

PowerVM and PowerPC architecture.

 

IBM Power servers have higher memory bandwidth, and tools for numa/affinity monitoring tuning. With dedicated cpu assignment, you do not care about overcommit or noisy neighbor issue.

Just keep in mind that enterprise server Power 9-E980 has buffered memory with 210GB/s per socket and HANA DB will perform extremely well. P9- E950 and below is on par with x86_64 on memory bandwidth. (~130GB/s)

Power 8 series are using only buffered memory.(~200GB/s)

 

 My best practices for PowerVM are;

 

  • Configure dual virtual i/o servers.
  • RMC connection is  a must.
  • Use NPIV instead of VSCSI
    • If you insist on vscsi please check your queue_depth values.
  • Use shared processor with desired of 4 cores minimum for VIOS. (1 entitled 4 vCPU/32LCPU max)
    • On some landscapes I see 1 dedicated core for Virtual I/O Servers, even default values and documentation indicates one.  That is bad practice, a heavy network load will consume 2-3+ cores, and your whole system will lag very badly. I am not even talking about high priority of nw i/o over disk i/o.
  • 4 cores/32Lcpu’s for production are minimum values regarding to SAP notes, but increase it parallel to the memory size.
  • Check memory affinity with lsmemopt command on hmc, if it is defragmented, optimize it. (chmemopt)
    • If it is still not on an optimized state, do the following.
    • Check CPU/memory ratio, make it sensible ( not 4 cores / 2TB of memory, 4 cores cannot directly access/address so much memory locally.)
    • Start the server with the greatest memory so hypervisor can fit it correctly.

 

My hypervisor recommendations are finished, lets  continue with Input/Output.

 

I/O General:

DISK Infrastructure:

 

HANA DB is an in-memory database, and it is not heavy disk I/O dependent other than startup.

One exception is log area. Database logs need fast disk access. Backups also do heavy sequential writes.

 

As a former Linux expert, I have some recommendations.

 

  • Use a Separate /hana/log logical volume. Each filesystem has separate journal.
  • Use LVM and use mulitple Luns for hana data area, stripe the lvm. Each luns will have its own queue_depth.
  • Please configure multiple datastores on hypervisor/host, match that to the FC paths.

25 TB lun with a queue_depth of 64 is a joke.

 

  • Some storage controllers cannot work active/active and if you just create one datastore (LUN), it can only use the resources of one storage controller!

CPU power, caches are not being touched, you will run active/passive storage.

 

  • Read Intensive SSD disks are not cost effective for backup solutions, they might wear down quickly on heavy backups. You can use NL-SAS  instead. Backups are high block sized sequential write operations.

 

 

While creating file systems, leave 2-3GB free on the volume group where loglv resides. I do not know why but I am called more than 10 times for /hana/log fillup. If that area fills up, database will stop, if this is an appliance you are in trouble. On TDI you might add another lun and resize.

 

When I was a vendor consultant I always did the extra free space trick, and won the hearts of my customers.   When the time comes, I just tell them resize the logv by 500MB (not full resize, they might fill it again), resize the filesystem and please check and fix your log backup retention.

One more last thing, you can increase but cannot shrink xfs filesystems, don’t ask me why.That is the downside of Xfs. ( Did I tell you that I really do not like BtrFS?)

 

Switch to noop I/O scheduler as recommended by SAP, if you are using enterprise storage. Deadline  performs also very well if you are on sas localdisks. If you are on suse11, default i/o scheduler is cfq and does not fit DB workloads. For and flash based disk noop should be the preffered choice, ssd disks are so fast, they do not a scheduler. Noop means no scheduling.

 

Network Infrastructure:

I will focus on it in High Availability topic.

 

Operating System Guidelines:

HANA DB can run SuSE or Redhat Distributions.  Latest editions of Linux Distributions use journalctl logging system which does not  “persist” some OS logs, means if the systemss reboots, logs  are gone!

I really like to question this decision, but for now, I suggest enabling persisten logging.

 

mkdir /var/log/journal

systemd-tmpfiles --create --prefix /var/log/journal

systemctl restart systemd-journald

If this is a multipath  disk setup, check your multipath configuration accordingly. Wrong multipath configuration will have a big impact on disk subsystem performance. Multipath service is not enabled on initial OS installations when there is no multipath device present.

It means if you attach multipath devices later, you should enable multipath/device-mapper service manually.

Test your performance with native tools (fio,sysbench,iometer) or use sap hardware checktool.

Monitoring i/o performance on  multipath devices can be tricky, use my uber script mtools.sh.

 

https://github.com/tayore/ozantools

 

Never, ever install any application on root file system including hana and /usr/sap (yes that too). Separate OS and application, and if this is a bare metal server, take an image of root file system. There are opensource applications like clonezilla, relax and recover. I worked with them and they do the job.

 

 

This is the first part of my best practices series.

 

Assigned Tags

      9 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Sumit Jaiswal
      Sumit Jaiswal

      Thanks for important info Ozun !

      I have a few queries. May I seek your attention please.

      For a 2 socket, 128 vCPUs server of RAM 2 TB, We should create the VMs of 32 vCPUs (0.5 socket) or 64 vCPUs ( 1 socket) with the memory of 500G or 1TB respectively . Am I correct? We should not create a VM of 32 vCPU with 800G to avoid "far memory" case. What is the memory-to-vCPU relation?

      When you say " 56 cores, 112 vCPU server", are you considering the threads as well? Isn't vCPU considered as core?

      How does CPU thread impact the performance in a VM ?

      Author's profile photo Ozan Uzun
      Ozan Uzun
      Blog Post Author

      Hello,

      please check SAP Note 2315348,2393917 for detailed information.

      You are correct, but no odd sizes, I mean you can create 0.5, 1 ,2 ,4 socket production VM's.

      Each socket has direct connection to  a certain level of memory and that memory will be attached to that assigned VM.

      I can not give a vCPU-memory  calculation because it differs by CPU type. For a 28 core CPU, you can add 14-28, for 20 core CPU 10-20 etc. A server can have 1, 2 or 3TB of memory.

      x86 systems have hyperthreading capability, I choose to call physical cores--> core, threads---> vcpu. Thread is a logical term.

      On ppc64le systems,smt is 8 ways, 1 core is 8 threads (8 vCPU)

      Personally I recommend hyperthreading on for HANA DB systems.

      Author's profile photo Sumit Jaiswal
      Sumit Jaiswal

      Thanks Ozan for answers.

      i think SAP ( for applications and databases) too counts in threads in "the number of processors" i.e SCU.

      Example ( just checked) : ST06 and output of nproc (linux) give the count of (socket x core x threads per core).

      I'd like to understand if both these cases will be equally good in processing for a calculated SAPS NN :

      Case 1 - 2 socket x 16 cores x dual threading

      Case 2 -  2 socket x 32 cores x single threading

       

      Author's profile photo Ozan Uzun
      Ozan Uzun
      Blog Post Author

      That is a highly technical topic, but let me try to explain with examples.

      A hyperthread x86 core can sustain up to 10%+ performance compared to single thread.

      I mean a single core might get 10% more performance if hyperthread is on depending on the workload.

       

      56 vcpu is not the double performance  of 28 cores, it  helps with density. It allows the hypervisor to assign more vcpus’ to real cores. Hyperthreading is an optimum way of using idle cpu time. It should not be mistaken with SMT.

      If a core, 2 vcpus’s are shared between two virtual machines,and both VM's request cpu power , they will share the cpu time, almost half the  performance.If one VM is idle, there is no issue, cpu power is available for the other.

      Ideally most vm's do not use more that 10% cpu, and share the cpu's all the time.If a core is fulll, hypervisor will switch core owner from  one to another VM.

      The computation power of a vCPU is same as a core, when there is no overcommitment.

       

      A 32 core single thread  (hyperthreading disabled from BIOS) Virtual Machine will get almost double performance of 16 core-32 thread Virtual Machine.

       

       

      BUT there is the  hypervisor effect. If the host is idle, KVM,Xen or VMware will assign those overcommitted virtual cpu’s to idle cores.So you can never know if your 32 vcpu VM uses whole or half cpu capacity, or partial.

      Example again, not whole truth but will help you understand.  If you have a 40 core- 80 vcpu host and you assign 32 vcpu to a single VM, that VM will use the power of 32 real cores.

      If you create more virtual machines, hypervisor will assign more VM’s to cpu cores.If there is only 20 cores idle, it might use the power of 20 cores.

       

      That is the whole reason of SAP HANA SAP Notes and guides. There is a very important sentence there

       

      ” CPU and Memory overcommittment must not be used.”

       

       

      Author's profile photo Sumit Jaiswal
      Sumit Jaiswal

      Got it well now.

      Thanks for the detailed explanation!

      Author's profile photo Poyraz Sagtekin
      Poyraz Sagtekin

      Thank you very much for this great information!

      Author's profile photo vishwanath vedula
      vishwanath vedula

      good one...

      Author's profile photo Jens Gleichmann
      Jens Gleichmann

      Thanks Ozan for this great summary. A lot of customers ask questions again and again about exact these named topics.

      It seems to be your first blog here. Just some hints to make it more readable:

      • use tables and horizontal lines to seperate sections
      • the big gaps between each section makes it hard to read on mobile devices
      • use links to the used documentations and SAP notes as related sources.

      A good extension to my old blog about VM performance in context of HANA.

      Thumbs up, keep on writing!

      Regards,

      Jens

      Author's profile photo David Bank
      David Bank

      As a current Linux system engineer for HANA 2.0, I'd like to offer some comments/observations in response to Ozan's informative blog entry above.

      The concerns regarding multipathed storage are legitimate, but I find that giving each LUN an alias is very helpful. If you try to identify a LUN by 60000247df6a7b9351, you're more-likely to make a mistake than if you refer to it as data0. For example, on HANA systems, I organize the storage according to mount-point, so I create a matching configuration file under /etc/multipath, and it has entries like this:

      multipaths {
         multipath {
           wwid 36xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0
           alias data0
         }
         multipath {
           wwid 36xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1
           alias data1
         }
         multipath {
           wwid 36xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx2
           alias data2
         }
      multipath {
        wwid 36xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx3
           alias data3
         }
       }

      I'll make a similar file for log, backup, etc. I can then better understand the storage layout.

      All of the LUNs underlying /hana/data are listed in one file, and have a consistent naming convention. Whether or not I stripe the storage does not have to affect the alias names (if you want the alias names to track that as well, adjust the naming convention to suit).

      Regarding striping - it definitely doesn't hurt performance, but it makes management a bit more complex. For example, let's say you use 1TB LUN sizes, and you need to create an 8TB filesystem. OK, so you stripe across 8 LUNs, with aliases data0 through data7. All is well.

      Suppose that, later on, the filesystem needs to be expanded to 12TB. The rub here is that in order to maintain the striping, you must add the new storage in the form of the same number of LUNs. If you do not maintain the striping, then LVM will view the added storage as an effectively unstriped segment.

      I'm not saying to avoid striping - all I'm saying is that there can be trade-offs.

      XFS vs. Btrfs

      I confess to being an XFS skeptic back in the days of RHEL v5-6. I'm much more comfortable with it now. One issue with Btrfs is that it has a relatively high meta-data overhead. Also, since it is Copy-on-Write, it can dramatically increase the practical storage needs. Accurately determining free space is a non-trivial exercise.

      During the HANA migration, I went 100% XFS for the SLES v15 hosts (not just for /hana/* and /usr/sap, but also for /boot, /var, /home, /tmp, etc.) 18 months later I have no regrets. As a rule, the filesystems I manage are only growing - I don't recall a serious need to shrink one.

      For networking, we built out on Power9, and used SRIOV - blazing fast, it functions at near-media speed. With no tuning, I got 8.9Gb/s across a 10Gb/s infrastructure. The main downside is that (as of right now) you cannot use LPM. IBM says they're going to fix that.

      Logging

      I run rsyslog on the SLES hosts and send all the log entries to a central host running syslog-ng, where I have one directory per host and sort the entries into files based on the Facility. That makes my logging persistent. But yes, systemd isn't making it any easier. It's one of my concerns looking at SLES v15 SP2.

      I hope readers find this a helpful addendum.