VMware ESXi 5.5 p08 or ESXi 5.5 p09: Consumption of VMware Extended Guest Statistics may lead to virtual machine hanging issues
When running SAP Host Agent greater/equal to 7.21 PL5, leveraging the latest ESXi Extended Guest Statistics on ESXi 5.5 p08 or p09 host, you may experience the virtual machine becomes hung irregularly either with Linux or Windows.
You can check upfront with esxtop performance monitoring utility if you may have the issue. Please observe if the memory usage statistics of the virtual machine world (OVHD) are steadily increasing in one minute increments.
If the virtual machine had already crashed, you may see in the vmware.log these lines (examples, Date, time, and environmental variables may vary depending on your environment):
YYYY-MM-DDTZ| vcpu-1| I120: VERIFY bora/lib/misc/strutil.c:1079
YYYY-MM-DDTZ| vcpu-1| W110: A core file is available in “/vmfs/volumes///vmx-zdump.001”
YYYY-MM-DDTZ| vcpu-1| W110: Writing monitor corefile “/vmfs/volumes///vmmcores.gz”
YYYY-MM-DDTZ| vcpu-1| W110: CoreDump error line 2160, error Cannot allocate memory
and in the /var/log/vmkernel.log file, you see entries similar to:
YYYY-MM-DDTZ cpu21:101694)UserDump: 1907: Dumping cartel 100618 (from world 101694) to file /vmfs/volumes///vmx-zdump.001 …
YYYY-MM-DDTZ cpu20:101694)UserDump: 2027: Userworld coredump complete.
YYYY-MM-DDTZ cpu48:33427)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
YYYY-MM-DDTZ cpu28:100618)WARNING: World: vm 100618: 3974: VMMWorld group leader = 100619, members = 2
The cause is an issue that has been found in the statistics collection in ESXi which leads to a memory leak and, in consequence, the virtual machine fails due to memory exhaustion.
This is a known issue affecting VMware vSphere (ESXi 5.5 p08, ESXi 5.5 p09).
VMware vSphere 6.0 is not affected.
For more information please read SAP note 2381942
ESXi 5.5 p10 has been release in December and the memory leak has been fixed again.
Virtual machines that run SAP application might randomly fail into the vmx.zdump. An error message similar to the following is displayed when running too many VMware Tools stats commands inside the VM.
CoreDump error line 2160, error Cannot allocate memory.