Collecting Symptoms

Former Member · ‎09-18-2014

Today I solved an interesting HANA startup issue and wanted to share it with you.

Collecting Symptoms

yellow status

HANA did not start up on the worker node in my scale-out landscape. On Linux, I logged in as sidadm and called

sapcontrol -nr <instance number> -function GetProcessList

This showed me "YELLOW" for the process hdbindexserver.

long startup

The nameserver process had taken an unusually long time to start.

clean shutdown impossible

HDB stop did not work, I had to do killall -9 hdbindexserver to bring down HANA.

strace

Now I wanted to know what hdbindexserver does and I straced it:

ps -A | grep hdbindexserver
19416 ? 00:03:13 hdbindexserver
Process 19416 attached - interrupt to quit
epoll_wait(17,

Using the command man 2 epoll_wait told me the process is waiting for some event on a file... that does not help us much, you will see the same output for a sane HANA installation. Let's move on to find out what its subprocesses are doing:

strace -ffs 9999 -p 19416

[pid 19438] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 19436] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 19438] futex(0x7ffe86826870, FUTEX_WAIT_PRIVATE, 0, {0, 1000000} <unfinished ...>
Same here - man 2 futex tells me it is waiting for some value to change. And running this in a sane HANA environment gives me the same output.

Strace cannot help us.

trace files

Proceeding like as described here I took a look at the log files (or "trace files"). I even set saptracelevel to 5 in /usr/sap/<SID>/HDB<NR>/exe/config/global.ini. But the indexserver's trace file only contained two lines after startup:

hostname:/usr/sap/<SID>/HDB<NR>/hostname/trace> cat indexserver_hostname.30003.000.trc

[...]

[25327]{-1}[-1/-1] 2014-09-18 11:04:33.548012 i Service_Startup translog.cc(01634) : Activating private log buffering mode
[25327]{-1}[-1/-1] 2014-09-18 11:04:33.548051 i assign TREXIndexServer.cpp(00730) : persistence started with volume 6
[25419]{-1}[-1/-1] 2014-09-18 11:09:34.658969 w Logger SavepointImpl.cpp(02447) : NOTE: BACKUP DATA needed to ensure recoverability of the database

And again, I had set saptracelevel to 5 in /usr/sap/<SID>/HDB<NR>/exe/config/global.ini. However, there was one hint in the nameserver traces:

hostname:/usr/sap/<SID>/HDB<NR>/hostname/trace> cat nameserver_alert_hostname.trc
[...]
[10155]{-1}[-1/-1] 2014-09-18 12:02:03.918610 e TNS TNSClient.cpp(00800) : sendRequest setstarting to master:30001 failed with NetException. data=(S)databaseid=2|host=hostname|port=30001|(I)type=3|(B)watchdog=0|(N)node=host|hostname|nameserver|...|...|...|
[10155]{-1}[-1/-1] 2014-09-18 12:02:03.918647 e NameServer TREXNameServer.cpp(09839) : master nameserver@hostname:30001 not respondin.g retry in 5 sec

network

Now as the previous symptom has pointed us to a network problem, we drill down on that. Indeed the command lsof -P -p 18994 (where 18994 is the PID of the name server) showed much more established connections on a sane node than on this node. It was possible to connect to any port on the master name server from the server with the error, but not to the server with the error. To find that out, best way is to do a telnet <host> <portnumber>.

Reason

On the server with the error, the firewall was up which prevented HANA from starting. At least to me this was counter-intuitive as I regarded the HANA worker node as the initiator of the communication and these (outbound) requests were not blocked by the firewall.

Solution

Solution was to stop the firewall, in this case for SUSE Linux with the command

/etc/init.d/SuSEfirewall2_init stop

/etc/init.d/SuSEfirewall2_setup stop

and then disable the firewall with the tool called with the command

yast2 firewall

Then HDB start worked fine.

IndexServer process yellow in sapcontrol

Collecting Symptoms

yellow status

long startup

clean shutdown impossible

strace

trace files

network

Reason

Solution

More Solutions

Now live: 2014 SAP HANA and SAP HANA Cloud Applications Challenge voting

My Personal Ux, Fiori, Portal and Cloud Cheat Sheet

Web Dynpro ABAP Demonstration Videos