Best Practice: Repairing a Failed SAP Instance (Part 2 – Restoring a Failed SAP Instance)
In the first part of this blog post, I presented common issues that lead to a failed SAP service. Now, I want to talk about the scenario where the SAP service is running but the SAP instance can’t be started. This scenario has various root causes and I want to focus on the most common ones.
You’ll see that the start process of the SAP instance can break at several stages and the appearance of the failed instance in the SAP MMC can differ from scenario to scenario. On the following screenshots you can see the SAP MMC and the Failover Cluster Manager on which the ASCS instance couldn’t be started.
As in the first part of this blog post, I use a failed ASCS instance as an example, but our recommendations can also be applied to SAP dialog instances or SAP enqueue replication server instances.
Action Plan: Where Do I Begin?
The ASCS instance has been stopped and can’t be started. Well, that’s bad. The first question you probably ask yourself is: Why is this happening and where can I find information about this issue?
In most cases you find the answer to this question in the work directory of the failed instance. For an ASCS instance, this directory is located in <DriveLetter>:\usr\sap\<SID>\ASCS<InstanceNumber>\work.
In my example, the path to the work directory of the ASCS instance is E:\usr\sap\MG4\ASCS24\work.
Usually, there are lots of different trace and log files. To find the relevant ones, a good approach is to sort via Date modified and in descending order. Then, the recently changed files appear on the top and you can start your search for an error from top to bottom.
If an SAP instance fails to start, you can often find the root cause in the instance profile. Then, the profile parameter reference might be helpful. You can access it by right-clicking on an SAP instance in the SAP MMC and choosing All Tasks -> View Instance Parameter.
Scroll down to the line of the profile parameter of interest, select it, and choose the Help button. Then, the profile parameter reference opens in the browser and you find a description of the profile parameter.
Another really helpful resource for finding misconfigurations in profile parameters is the report RSPARAM, which you can start in transaction SE38. In particular, the Display also unsubstituted option is great to research how different profile parameters are assembled together and to find out what the default values shipped by SAP are.
With that we have everything we need to dive right into common scenarios where the SAP instance has failed.
The Services File Is Misconfigured or Can’t Be Accessed
The services file is commonly used to bind Microsoft services to the ports they should use. This configuration file can also be accessed by applications to find out how a Microsoft service can be addressed and which port must be used.
Now to the interesting part: SAP instances use this services file not only to read out the ports of a Microsoft service that should be used, the ports that SAP instances use for themselves can be configured in the service file.
What happens if an SAP instance can’t access this service file? Well, you probably guessed right: The SAP instance is stuck in the failed state. In our example, the message server can’t start.
Diving into the dev_ms trace file of the ASCS instance, we see the following error:
I must admit the error message isn’t that pretty and doesn’t say too much. However, chances are good that, if you encounter a situation where the NiBufListen() function throws an error with return code rc=NIESERV_UNKNOWN, an entry of the services file couldn’t be read.
The services file is in the directory C:\Windows\System32\drivers\etc and you need administrator privileges to edit it.
In my case, opening the service file reveals that the entry for the message server has been commented out (line 292).
It’s interesting to mention that the default message server port depends on the instance number of your ASCS instance. The default value is 36<InstanceNumber>, in my case 3624.
Another common error is shown on the following screenshot:
Here, the file ending of the service file is .txt. That’s wrong! In this configuration, services and applications can’t access the service file because they expect no file ending. You can easily fix this by right-clicking on the service file and choosing Rename. Then, you can delete the file ending.
Important Profile Parameters Are Configured Wrong
Every time an SAP instance is started, first the default profile is read. The default profile is the same for all SAP instances and is located in the SYS directory on the central sapmnt share. In my case it can be found at \\mg4ascs\sapmnt\MG4\SYS\profile\DEFAULT.PFL. In a second step, the instance profile that is located in the same directory is read. Both profiles contain necessary parameters to start and operate the SAP instance. In cases where a profile parameter is defined in the default profile as well as the instance profile, the value in the instance profile is used.
There are tons of profile parameters and the configuration of an SAP system via these profile parameters is an art in itself. As complicated as profiles can get, I only cover the most common misconfigurations that lead to a failed SAP instance in the following.
It’s important for you to know that invalid profile entries, for example, profile parameters with a typo in the keyword, are ignored by the SAP instance. Another aspect that I want to share is that profile parameters are case-sensitive. You can detect that kind of error, where the SAP instance doesn’t recognize a profile entry, with the Check Parameter feature of the SAP MMC.
If a profile parameter can’t be recognized, an Unexpected parameter warning appears. In this example, there’s a typo in the Start_Progarm_02 parameter and I used a lower case s in the start_Program_00 statement. Additionally, I get the warning that ms/standalone is obsolete.
You need to know that most of the profile parameters modifications are only applied to the SAP instance when the corresponding service is restarted. After modifying a profile parameter, you can restart the SAP service by using the services.msc tool as shown in the previous blog post or by using the SAP MMC. Just right-click on the instance and choose All Tasks -> Restart Service.
SAPGLOBALHOST Is Configured Wrong
This parameter describes the host name of the machine where the sapmnt share is located. If the SAPGLOBALHOST parameter is set incorrectly, the SAP instance can’t access the sapmnt share and the SAP instance can’t be started because the system can’t find important executables. As you can see on the following screenshot, the sapcpe.EXE process could not be started, and the instance is red in the SAP MMC.
Go to the work directory of the affected instance and take a look into the sapstart.log. Here, you find a message that the sapcpe.EXE can’t be found by the system. This error is an indication that the network path to the sapcpe.EXE has been assembled incorrectly.
In the report RSPARAM, you can trace how the network path to the sapcpe.EXE is constructed. In this example, the analysis in RSPARAM shows that the profile parameter SAPGLOBALHOST is used to construct GLOBALHOSTPATH, which in turn is used to construct DIR_INSTALL. DIR_INSTALL is used to construct DIR_EXE_ROOT, which is used to construct DIR_CT_RUN. DIR_CT_RUN is finally used to call Start_Program_00 and Start_Program_01, which start the sapcpe.EXE. Pretty complicated, right?
This leads us to the conclusion that the root cause for the failed sapcpe.EXE call is a misconfiguration of the SAPGLOBALHOST parameter. After entering the correct value (in my example, mg4ascs), the SAP instance can start the sapcpe.EXE as intended.
START_PROGRAM Is Configured Wrong
In this section, I want to explain the problems that can arise from a wrong usage of the START_PROGRAM profile parameter. This profile parameter is used to start SAP executables. The profile parameter reference reveals the syntax of this parameter:
START_PROGRAM_<index> <immediate/local> <exe path> <profile path>
The index is a number in the range from 00 to 20. The immediate flag is passed if programs should be started in a fixed order. When this call is complete, programs with the flag local are started. Usually, all calls of sapcpe.EXE use the flag immediate and the executables of the actual SAP instance (msg_server.EXE, enserver.EXE, disp+work.EXE etc.) are started with the flag local. Then, I must pass the correct path to the executable I want to call and the profile I want to use.
In the previous section, you saw the effect of a START_PROGRAM call of sapcpe.EXE where the path to the executable was broken because the SAPGLOBALHOST profile parameter had been set incorrectly. Unfortunately, there are many more scenarios where the START_PROGRAM parameter can cause issues. Thus, I want to focus on only two unique scenarios that’ll make you aware of what can go wrong in the START_PROGRAM statement.
On the following screenshot of the SAP MMC, you can see that the ASCS instance is started and green. The dialog instances are yellow and the disp+work.EXE process states that it can’t connect to the message server. What happened?
A click on the process list of the ASCS instance reveals that the msgserver.EXE process is missing in the SAP MMC. At this point, we must assume that something went terribly wrong. The Task Manager confirms that there’s no msgserver.EXE process running on this host.
Let’s dive into the instance profile of our ASCS and check if there’s a Start_Program statement for the message server and how this call is made.
In this example, in line 24, there’s a Start_Program_02 statement where the path to the message server executable is passed. Also, the other arguments that are passed on to the Start_Program statement are correct and I made sure that the path to the message server executable as well as the ASCS instance profile are correct.
The problem is the index of the Start_Program profile parameter. Start_Program_02 was used for the start of the msg_server.EXE as well as enserver.EXE in line 29. Eventually, this led to the situation where only the enqueue server was started.
The SAP MMC Check Parameters functionality would directly have revealed this issue.
You should learn from this scenario that the index of the Start_Program profile parameter must be unique and you should be aware that this parameter should be handled with care because its syntax is quite restrictive.
What do you think will happen if there’s no Start_Program statement at all in your profile? Here, I commented out every Start_Program statement in the ASCS profile:
On the screenshot above, you can see that the ASCS instance is grey as there are no processes running. The dialog instances are yellow because the dispatchers can’t find the message server.
RESTART_PROGRAM Is Configured Wrong
The Restart_Program profile parameter is very similar to the Start_Program parameter. You can use it in the same way and pass on the same arguments. The only difference is that a program that has been started with Restart_Program is automatically restarted if it fails unexpectedly. We mustn’t use this parameter for processes that are maintained in a cluster. Here, the cluster itself manages when a process is restarted.
In general, everything that can go wrong with the Start_Program parameter can also go wrong with the Restart_Program parameter. Additionally, you need to make sure that the Start_Program and Restart_Program statements in your profile don’t share the same index. If a Start_Program_04 parameter is followed by a Restart_Program_04 parameter, only the latest statement is effective.
This concludes my blog post on repairing a failed SAP instance. Maybe you’ve learned something new and hopefully your SAP instances are running.
Hello Mirko Goeddel;
Thank you very much for this relevant demonstration
Thank you for your feedback, that I appreciate very much 🙂