Skip to Content
Technical Articles
Author's profile photo Jens Gleichmann

[HANA] Unleash the performance of your VM

Most performance issues which I was working on turned out to be basic issues regarding HANA / Linux parameter and configuration of the Hypervisor. Virtualization is regardless if big or small systems also in HANA environment an often-chosen architecture. If you want ensure good performance and how to check it in your environment keep on reading 😉

Most systems run on VMware but also more and more systems are planned or already running on Power. Here I only speak from on premise installations, because the ones in the cloud from Hyperscaler like azure (Hyper-V), AWS (own KVM) or GCP (own KVM) you can´t real take influence on. For the biggest instances there are bare metal installation which make it pretty easy for the NUMA configuration. The application HANA is NUMA aware.


NUMA is a good keyword to start because this one of the most ignored / intransparent performance issues. What is NUMA and why should you take attention on it when you install a HANA on a hypervisor.

 

NUMA – Non-uniform Memory Access

“NUMA is a method of configuring a cluster of microprocessor in a multiprocessing system so that they can share memory locally, improving performance and the ability of the system to be expanded.”

=> OK, this sounds not really self-explanatory, or?

Let’s take an example with a picture:

The performance impact depends on type of CPU, vendor (topology) and number of sockets.

This means a local access is always 2-3 times faster than a remote one. But how can you have influence on the placement of a VM (=virtual machine)?

The hypervisor should normally take care of this. But in special cases like big HANA VMs or wrong default settings of the VM you have to adjust it manually. This should be done for all productive HANA servers. Normally the person who have installed the HANA should be aware of this, but experience shows that in 90% of the installations nobody cares about.


IBM Power (PPC)

On IBM Power an optimization is pretty easy with the latest HMC versions:

# on ssh shell of the HMC
# Listing of all servers
$ lssyscfg -r sys -F name

 

# dynamic platform optimizer (DPO) => NUMA optimization

$ hscroot@<ip-hmc>:~> lsmemopt -m <pServer Name> -r lpar -o currscore

$ hscroot:~> lsmemopt -m pserv1-r lpar -o currscore
lpar_name=hana1,lpar_id=1,curr_lpar_score=100
lpar_name=hana2,lpar_id=2,curr_lpar_score=100
lpar_name=hana3,lpar_id=3,curr_lpar_score=none
lpar_name=hana4,lpar_id=4,curr_lpar_score=100
lpar_name=hana5,lpar_id=5,curr_lpar_score=none
lpar_name=hana6,lpar_id=6,curr_lpar_score=100
lpar_name=hana8,lpar_id=8,curr_lpar_score=32 << improvable LPAR

 

# on ssh shell of the HMC
# use DPO for optimization
$ optmem -m <Power Server Name> -o start -t affinity -p <name(s) of improvable LPAR(s)>

$ optmem -m pserv1 -o start -t affinity -p hana8

# check running background activities
$ lsmemopt -m <Power Server Name>

 


VMware

On VMware this is trickier than on IBM Power, because also the sizing rules differ.

With VMware you can use half socket sharing or if your VM I bigger than one NUMA Node / Socket, you have to round up and must allocate the full socket. This leads to some resource wasting.

Here a picture from ©VMware:

 

Every VM which is bigger than one socket is called ‘wide VM’.

One example for you which can be also checked in your environment by using the shell on your ESX.

If you are not familiar with such commands and interpreting the results I can recommend the book ‘VMware VSphere 6.5 Host Resources Deep Dive’ written by Niels Hagoort and Frank Denneman. You can also use their blogs: https://frankdenneman.nl/

Alternatively I’m sure you find a way how to contact me 😉


Example  – remote Memory access / Overprovisioning

####################
ESX
E5 – 2695 v4
18 cores per socket
2 sockets
72 vCPUs
1 TB RAM
####################

 

HANA Sizing:
600GB RAM
36vCPU​

 

Current Setup:

768 GB RAM
36vCPU

 

Sizing rules:

1 TB RAM (=> 2 sockets, because one NUMA node has 512GB and we need more than this)

72vCPU

 

This is currently one the famous mistakes which I can see in about 60% of all environments, because the VM admin is not aware of the sizing rules of the SAP HANA and most of them are not aware which influence their VM settings can have on the topology and the resulting performance. So, attention to placement and overprovisioning.


ESX view

groupName           groupID    clientID    homeNode    affinity     nWorlds   vmmWorlds    localMem   remoteMem  currLocal%  cummLocal%
 vm.78924              58029           0           0         0x3          16          16    73177088           0         100          99
 vm.78924              58029           1           1         0x3          16          16    72204288           0         100         100
 vm.1237962         76880487           0           0         0x3          16          16    18254012   250242884           6          53
 vm.1237962         76880487           1           0         0x3          16          16   267603968      831488          99          66
 vm.1237962         76880487           2           0         0x3           4           4   145781060   121605820          54          56

 

Here we see an ESX with 2 VMs 1237962 is our hdb01 HANA DB which has 16+16+4 vCPUs (3 Sockets) and we can see it consumes remote memory. Wait a moment – 3 sockets? Our physical server has only 2. Yes, this is possible with VMware, but it is an additional overhead and costs performance. You can also create an 8-socket server within a 2 socket ESX, but it doesn’t make sense in context of HANA. There are other applications where this feature is useful.

But all of this “virtual sockets” are located on the physical socket-0. This leads to an overprovisioning of this node because the other VM additionally uses some resources.

 

nodeID        used        idle    entitled        owed  loadAvgPct       nVcpu     freeMem    totalMem
           0        5408       30591        5356           0          14          52    26703288   536736256
           1        1574       34426         926           0           3          16    85939588   536870912

Socket-0 using 52 vCPU and Socket-1 16? Seems to be that this ESX is a little unbalanced and overprovisioned.

vmdumper -l | cut -d \/ -f 2-5 | while read path; do egrep -oi “DICT.*(displayname.*|numa.*|cores.*|vcpu.*|memsize.*|affinity.*)= .*|numa:.*|numaHost:.*” “/$path/vmware.log”; echo -e; done

DICT                  numvcpus = "36"
DICT                   memSize = "786432"
DICT               displayName = "hdb01"
DICT        sched.cpu.affinity = "all"
DICT        sched.mem.affinity = "all"
DICT      cpuid.coresPerSocket = "4"
DICT      numa.autosize.cookie = "360001"
DICT numa.autosize.vcpu.maxPerVirtualNode = "16"
DICT        numa.vcpu.preferHT = "TRUE"
numaHost: NUMA config: consolidation= 1 preferHT= 1
numaHost: 36 VCPUs 3 VPDs 3 PPDs	
numaHost: VCPU 0 VPD 0 PPD 0
numaHost: VCPU 1 VPD 0 PPD 0
numaHost: VCPU 2 VPD 0 PPD 0
numaHost: VCPU 3 VPD 0 PPD 0
numaHost: VCPU 4 VPD 0 PPD 0
numaHost: VCPU 5 VPD 0 PPD 0
numaHost: VCPU 6 VPD 0 PPD 0
numaHost: VCPU 7 VPD 0 PPD 0
numaHost: VCPU 8 VPD 0 PPD 0
numaHost: VCPU 9 VPD 0 PPD 0
numaHost: VCPU 10 VPD 0 PPD 0
numaHost: VCPU 11 VPD 0 PPD 0
numaHost: VCPU 12 VPD 0 PPD 0
numaHost: VCPU 13 VPD 0 PPD 0
numaHost: VCPU 14 VPD 0 PPD 0
numaHost: VCPU 15 VPD 0 PPD 0
numaHost: VCPU 16 VPD 1 PPD 1
numaHost: VCPU 17 VPD 1 PPD 1
numaHost: VCPU 18 VPD 1 PPD 1
numaHost: VCPU 19 VPD 1 PPD 1
numaHost: VCPU 20 VPD 1 PPD 1
numaHost: VCPU 21 VPD 1 PPD 1
numaHost: VCPU 22 VPD 1 PPD 1
numaHost: VCPU 23 VPD 1 PPD 1
numaHost: VCPU 24 VPD 1 PPD 1
numaHost: VCPU 25 VPD 1 PPD 1
numaHost: VCPU 26 VPD 1 PPD 1
numaHost: VCPU 27 VPD 1 PPD 1
numaHost: VCPU 28 VPD 1 PPD 1
numaHost: VCPU 29 VPD 1 PPD 1
numaHost: VCPU 30 VPD 1 PPD 1
numaHost: VCPU 31 VPD 1 PPD 1
numaHost: VCPU 32 VPD 2 PPD 2
numaHost: VCPU 33 VPD 2 PPD 2
numaHost: VCPU 34 VPD 2 PPD 2
numaHost: VCPU 35 VPD 2 PPD 2

 

Here we can see that the mapping of VPD to PPD is 1:1, but there is no physical third socket in an E-5 server 😉

At first, we have a wide VM. This means preferHT should be disabled. Another bullet point is the limitation of 16vCPU which lead to this 3 socket setup: 36/16=2,25 => ~3

Ok such numbers are fine but some pictures to realize what exactly this means:

 

In the last picture you can see that the 768GB doesn’t fit into 512GB, so a remote access is used to satisfy the need. The other VM should not be spread over two NUMA nodes. This has bad affects on the HANA performance.

So, in the end you have two options:

  • Reduce the size of your HANA and resize the VM that it fits into one NUMA node
  • Move the second VM away, so that the whole ESX can be used by the HANA VM

It is not allowed to share a socket for a prod. HANA VM with another VM (regardless if it is SAP application or not). This means also that overprovisioning is not allowed.

The shown example is not supported in many ways. SAP can discontinue support, but I haven’t heard from customers or colleagues that this ever happened, but what often be done is that VMware support will be contacted and be pretty sure that they will find this and your issue will be processed if you have supported setup.


Summary

  1. check your hypervisor setup
  2. check your sizing
  3. Let the systems be install by people with the correct certification and experience, this will save you a lot of trouble and money
  4. get an approval for each prod. HANA system from the person who installed it
  5. Execute a yearly health check – such a check with all details

It is a hidden topic but if you build a house or in this case a mission critical database on a weak base, you will sooner or later get into trouble because this is not scalable.

 

Source:

Decoupling of Cores per Socket from Virtual NUMA Topology in vSphere 6.5

Introduction 2016 NUMA Deep Dive Series

SAP HANA on VMware vSphere

SAP HANA on IBM Power Systems and IBM System Storage – Guides

Assigned Tags

      3 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Bjoern-Olaf Seif
      Bjoern-Olaf Seif

      Hi Jens,

      good summary. Would realy make life easier when all support staff members of the different involved vendors would have this knowledge.

      Do you give an additional summary about NUMA affinity from the OS point of view?

      This would give a more complete picture and SAP and SuSE support don‘t need to reinvent the wheel everytime from scratch.

      Thumbs up and greetings

      Bjoern

      Author's profile photo Dmitriy Krivov
      Dmitriy Krivov

      Hello, Jens

      Thank you for very good explanation.

      I check my POWER8 LPAR and see next "numactl -H" output :

      available: 2 nodes (0,5)
      node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
      node 0 size: 0 MB
      node 0 free: 0 MB
      node 5 cpus:
      node 5 size: 262144 MB
      node 5 free: 119918 MB
      node distances:
      node 0 5
      0: 10 40
      5: 40 10

      And look like all memory is remote (belong to node 5 when all cpu belong to node 0)

      But on HMC side for this LPAR curr_lpar_score=100

      Is it normal or have some NUMA-problem on this LPAR ?

      Best regards,

      Dmitry

       

      Author's profile photo Jens Gleichmann
      Jens Gleichmann
      Blog Post Author

      Hi Dimitry,

       

      but all your memory your LPAR is working with is locally. For sure for balancing reason it would be better if both nodes have half the memory. But with L3 and L4 CPU shared caches there should be no dramatically performance impact. If node 0 and 5 are not satisfying your requirements and have to allocate from another node like 4 which no cores are allocate from then you have a NUMA issue.

       

      Regards,

      Jens