Summary
This document contains steps to be performed to install, configuring and testing an ERS instance failover.
Applies to NetWeaver Web AS Java 2004s onwards (Unix system)
Divided the article in below sections:-
The standalone enqueue server (SAP Central Services - SCS) is used in NetWeaver Web AS Java to provide a locking service based on the enqueue function. The enqueue clients (SAP application servers) and the enqueue server communicate directly, that is, the work process has a TCP connection to the enqueue server. They no longer communicate via the dispatchers and the message server.
The enqueue server keeps critical data (that is, all locks currently in use by users in the system) in the lock table in the main memory. If the host fails, this data is lost and cannot be restored even when the enqueue server is restarted. All transactions that have held locks must therefore be reset.
For this reason, the enqueue replication server (ERS) is started on another machine which together with the standalone enqueue server (SCS) provides a high availability solution.
This document contains all the steps to be performed for preparing, configuring, testing and trouble-shooting an ERS instance.
Activate replication by setting parameter enque/server/replication = true in the in the instance profile of the standalone enqueue server (<SID>_SCS<Instance_no>_hostname).
Set the parameter enque/deque_wait_answer = TRUE for the enqueue clients (application server instances) in the default profile.
The parameter enque/deque_wait_answer determines whether dequeue (removal of locks) is done synchronously or asynchronously. The parameter can have the following values:
TRUE: Waits for response from the enqueue server (synchronous)
FALSE: Does not wait for response (asynchronous)
Under administrator user <sid>adm perform the following steps on the both physical servers.
Create the following directory structure on the enqueue replication server (ERS):
/usr/sap/<SID>/ERS<inst.no>
+--- exe
| +--- servicehttp
| +----- sapmc
+--- log
+--- data
+--- work
Where:
The files can have different extensions on different UNIX platforms. Depending on the platform and whether Unicode is used, there may not be as many files.
|
The SCS instance could also have the following parameters:
SAPSYSTEMNAME = <SID>
SAPSYSTEM = <Instance_no>
INSTANCE_NAME = SCS<Instance_no>
SAPLOCALHOST = <SCS_Host_name >
The corresponding replication instance might look like this:
ERS SAPSID = <SID>
ERS INST.NO. = 11
ERS HOST = ERS_host_name
The corresponding start profile START_ERS11_<Host_name> will look like:
SAPSYSTEM = <Instance_no> SAPSYSTEMNAME = <SID> INSTANCE_NAME = ERS11 #-------------------------------------------------------------------- # Special settings for this manually set up instance #-------------------------------------------------------------------- SCSID = <Instance_no> DIR_EXECUTABLE = $(DIR_INSTANCE)/exe DIR_CT_RUN = /usr/sap/<SID>/SYS/exe/run SETENV_00 = PATH=$(DIR_INSTANCE)/exe:%(PATH) SETENV_01 = LD_LIBRARY_PATH=$(DIR_EXECUTABLE) _PF = $(DIR_PROFILE)/<SID>_ERS11_<Host_name> #----------------------------------------------------------------------- # Copy SAP Executables #----------------------------------------------------------------------- _CPARG0 = list:$(DIR_EXECUTABLE)/ers.lst Execute_00 = immediate $(DIR_EXECUTABLE)/sapcpe$(FT_EXE) $(_CPARG0) pf=$(_PF) #-------------------------------------------------------------------- # start enqueue replication server #-------------------------------------------------------------------- _ER = er.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME) Execute_01 = immediate rm -f $(_ER) Execute_02 = local ln -s -f $(DIR_EXECUTABLE)/enrepserver $(_ER) Restart_Program_00 = local $(_ER) pf=$(_PF) NR=$(SCSID) |
If you are using a local profile directory insert the following profile parameter into the start profile of the ERS instance: DIR_PROFILE = $(DIR_INSTANCE)/profile
2. Create an instance profile (not a symbolic link to a common file – it does not work) in the profile directory.
Parameters for the SCS instance (example):
SCS SAPSID = <SID>
SCS INST.NO. = 04
SCS HOST = <SCS_Host_name >
Parameters for the replication instance (example):
ERS SAPSID = <SID>
ERS INST.NO. = 11
ERS HOST = <ershostname>
Associated instance profile <SID>_ERS11_ershostname:
SAPSYSTEM = 11 SAPSYSTEMNAME = <SID> INSTANCE_NAME = ERS11 #-------------------------------------------------------------------- # Special settings for this manually set up instance #-------------------------------------------------------------------- DIR_EXECUTABLE = $(DIR_INSTANCE)/exe DIR_CT_RUN = /usr/sap/<SID>/SYS/exe/run #-------------------------------------------------------------------- # Settings for enqueue monitoring tools (enqt, ensmon) #-------------------------------------------------------------------- enque/process_location = REMOTESA rdisp/enqname = $(rdisp/myname) #-------------------------------------------------------------------- # standalone enqueue details from (A)SCS instance #-------------------------------------------------------------------- SCSID = 04 SCSHOST = <scshostname> enque/serverinst = $(SCSID) enque/serverhost = $(SCSHOST) |
If you are using a local profile directory insert the following profile parameter into the instance profile of the ERS instance: DIR_PROFILE = $(DIR_INSTANCE)/profile
The following options are available:
Here the replication server uses the HA software to periodically request information about the physical host on which the SCS instance is running. Depending on this information the ERS instance is activated or deactivated. To do this a tool (a script or a library from the HA hardware partner) is needed.
If you use an enqtest.sh script in directory DIR_EXECUTABLE you must insert the following lines in the instance profile.
#-------------------------------------------------------------------- # HA polling #-------------------------------------------------------------------- enque/enrep/hafunc_implementation = script enque/enrep/poll_interval = 10000 enque/enrep/hafunc_init = enque/enrep/hafunc_check = $(DIR_EXECUTABLE)/enqtest.sh |
With this solution the HA software will start the ERS instance, whenever required. The monitoring of the ERS instance and subsequent, if required, is handled by HA software.
This has the advantage that the replication table can be distributed across several hosts by means of the cluster software (shared file system), and that following a failover the enqueue server does not necessarily have to be restarted on the same host on which the active replication was running beforehand.
To also save the replication table in a shadow file in the file system, insert the following lines in the instance profile:
#-------------------------------------------------------------------- # replica table file persistency #-------------------------------------------------------------------- enque/repl/shadow_file_name = /usr/sap/<SID>/ERS11/data/SHM_FILESYS_BACKUP |
Note: Storing the replication table in the file system can lead to a severe drop in enqueue server performance. Beforehand, you should always check whether performance will be sufficient with this option.
It was noticed that, on SCS failover, the SCS process would read the replication table and terminate the ERS process, as expected. However, within 15 seconds, the ERS process would restart on its own. Based on an analysis of the process ids, it was concluded that the ERS process was being restarted by the sapstart process.
If a program managed by sapstart is restarted within 10 minutes, an internal counter is incremented. By default, sapstart no longer starts the program as soon as this counter is larger than 5 . This value can also be changed using the parameter 'Max_Restart_Program=xx' where xx represents the number of restarts.
To prevent this automatic restart of ERS, the parameter Max_Restart_Program=00 was added to the ERS start profile parameter.
Start the ERS server with below command
<sid>adm$ startsap ERS11
Check the startup log with the command
<sid>adm$ cat /home/<SID>adm/startsap_ERS11.log
Check the ERS processes at the OS level
<sid>adm$ grep –ef| grep –I ERS11
It should show you some processes .
To start the ERS instance automatically when the system is rebooted, insert the following line in file /usr/sap/sapservices based on the UNIX shell used by user root:
setenv LIBPATH /usr/sap/<SID>/ERS11/exe:$LIBPATH; (CSH)
LIBPATH=/usr/sap/<SID>/ERS11/exe:$LIBPATH; export LIBPATH; (BSH)
/usr/sap/<SID>/ERS11/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/START_ERS11_<ershostname> -D -u <SID>adm
This only works if:
NetWeaver AS version 7.00 has already been installed on this host
(Under UNIX) you have already performed the steps described in SAP Note 823941 (Configuring SAP Start Services as UNIX Daemons)
Repeat these steps for all the physical hosts in the HA failover cluster. If you want to make another SCS instance more fail-safe, you have to set up a separate set of ERS instances.
Once the Replication Server has been setup, check that it functions properly to be sure that the replication server will work correctly if the enqueue server fails. The following tests are performed on the host where the replication server is running.
The SCS instance of the SAP system has been started. Start program ensmon on the host on which the replication server is installed. To determine the replication server enter the following command:
ensmon pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11_<hostname>
If your ERS replication is running this connand will give below information
If the connection is OK, the output would look like:
Try to connect to host <Virtual (A)SCS host> service sapdp01 get replinfo request executed successfully Replication is enabled in server, repl. server is connected Replication is active ... |
If the connection is not OK, the output would look like:
Try to connect to host <Virtual (A)SCS host> service sapdp01 get replinfo request executed successfully Replication is enabled in server, but no repl. server is connected ... |
If the connection is not ok, first check whether the replication server has been started at all (using the operating system or the cluster software.)
If the replication server has been started, check files dev_enqrepl on the enqueue server or dev_enrepsrv on the replication server (in the work directory of the SCS or ERS instance). Use the error messages and profile files here to narrow down the cause of the problem.
Use program enqt to check the fill level of the lock table and the failover ID. The SCS instance of the SAP system has been started.
Start program enqt on the server on which the replication server is installed. Then use the enqt options described here. Otherwise you could damage the content of the lock table.
...
enqt pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11_<hostname> 20 1 1 9999
This command permanently reads the content of the lock table and shows the number of lock entries on the
...
enqt pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11__<hostname> 97
enqt pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11__<hostname> 97
The output for the row containing EnqTabCreaTime/RandomNumber should be exactly same before and after failover should be different .
To monitor the status of ERS/SCS instances, the commands described below may be used (for example in HA scripts):
To determine the replication server enter the following command:
ensmon pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS11_<hostname> 2
If replication server is active and connected to SCS, the following message is displayed in the first few lines of the output:
Replication is enabled in server, repl. server is connected
Replication is active
The status may also be checked with the command "startsap check", however, this command seems to check only the process at the OS level, not for the functioning of the process.
Send a dummy request to the server, to check if it is alive:
ensmon –H < scshostname> pf=/sapmnt/<SID>/profile/<SID>_SCS04_< scshostname> 1
Last line of output contains the message "Dummy request executed successfully with rc=0
The status may also be checked with the command "startsap check", however, this command seems to check only the process at the OS level, not for the functioning of the process.
You can find the following information in the enqueue server files:
There may be further dev* files that are not usually important. You should deliver these files to support anyway when you open a problem message.
You can find the following information in the replication server files:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
11 | |
10 | |
7 | |
6 | |
4 | |
4 | |
3 | |
3 | |
3 | |
3 |