Technical Articles
SAP ASCS High Availability using ERS explained
SAP ASCS High Availability using ERS explained
I have read and written many blogs on SAP High Availability but people seem to struggle to understand the inner workings of the mechanism that makes the SAP central services instance highly available
SAP Standard approach for making ASCS instance highly available is ERS
What is ERS?
ERS stands for Enqueue replication server and its job is to keep an up to date replica of the lock table so if something tragic was to happen to the ASCS instance the state of the table locks is safeguarded
That’s it?…. well yeah… its not a magic box!… or is it?…. on its own it does not guarantee the availability of the system, it just does what is stated above, to deliver the high availability desired its capabilities need to be combined with the features of a cluster with an automatic failover mechanism, that way when (or if) the ASCS instance crashes, its brought back to a different host/node where it will use the replication table to create a new lock table so the system can resume operation
What is the basic architecture of a highly available central instance?
At its leanest expression you need at least 2 nodes and a shared file system, for the purpose of this blog I’m just going to focus on the ASCS/ERS instances and the assumption is that the rest of the components are distributed on other nodes
Also you need a cluster provider with an automatic failover mechanism, again I’m not going to focus on a particular provider and make this as generic as possible so it applies to most scenarios
ASCS / ERS installation
In order for the ASCS and ERS instances to be able to move from one node to the other they need to be installed on a shared filesystem and using virtual hostnames, why?… because together with the virtual IP will be added to the cluster resource group so they can all switch as one logical unit.
A few high level tips for the installations,
Installation executable should point to that virtual host
./sapinst SAPINST_USE_HOSTNAME=<virtual hostname>
For this exercise the ASCS instance will be installed on sapnode1 using virtual hostname sapascs and the ERS instance will be installed in sapnode2 with virtual hostname sapers
Also post installation you need to make sure you have created mount points for both ASCS and ERS in their counterpart hosts (/usr/sap/<SID>/ASCSXX and /usr/sap/<SID>/ERSXX)
There is a number of other installation specific steps which are required but for the sake of keeping this generic I’ll leave those aside (I have included a few links at the bottom of this blog where you can check some of those)
Below is a representation of the basic requirements to kick the cluster basic configuration going including the inactive (grey) instance requirements
Once your cluster config is completed and your ASCS/ERS instances (and rest of your system components) are operational your system will look as below,
So, what happens If sapnode2 crashes?
Well, the system will continue to operate as normal because ASCS availability was unaffected. ERS will be brought back once sapnode2 is back online
What happens when sapnode1 fails?
Heartbeat monitor will trigger a cluster resource failover and the ASCS instance will be spun on sapnode2 together with ERS…. (this is part of the cluster colocation configuration) and will use the replication table to create a new table lock and resume operations, at the same time ERS will be shutdown (again also part of colocation rules) and will be shifted and brought back on sapnode1 once the host is back online
Just to be clear for a small period of time both ASCS and ERS will be running in parallel, this is necessary as the replication table is kept on memory in the node ERS is running and only once the ASCS has completed reading and recreating the lock table the ERS node will be killed and will wait to be moved to the other node once is back online
Ultimately once sapnode1 is back online the ERS instance will be started and will create a new lock replication table and the ASCS will be once more highly available
I hope this paints a picture on how the ASCS/ERS instances work and how together with the cluster guarantee the ASCS instance availability and hence the SAP system uptime and the business continuity
Last but not least, I would like to quote here some documentation and white papers I found very helpful,
High Availability with the Standalone Enqueue Server
RedHat configuration guide for ASCS/ERS
Love to hear your comments,
Regards, JP
Hi JP
Good one
Regards
SS
The traditional ENSA setup is still used widely wherein the ASCS follows ERS in the event of a failover and takes over the lock table. There is a new concept called the ENSA2 where a third host is involved and in the event of the ASCS crash, it starts the ASCS on the third host by retrieving the lock table over the network from the host where ERS runs. The same rule applies if the ERS fails. I haven't worked or used the new setup yet.
Nice to see you around Juan. Been a long time.
Cheers
RB
Hello Reagan,
ENSA1 Setup - ASCS follows the ERS as it can only sync the locks when both are running on same server.
ENSA2 Setup - ASCS can sync the logs remotely so it do not need to follow the ERS. You can have 2 node or 3 node setup depending on what and how you want. In a 2 node setup if enqueu fails and ASCS node is still up and running, it simple restart the enqueue and sync the locks remotely from ERS. If you use 3 host (or more if you can spend), you can have more resilience for multiple failures but the principle remains same that you can sync the locks remotely in ENSA2.
Thank you.
Best Regards,
Gaurav Pandey
Great and simple explanation! I Teach SAP on AWS and had doubts on how this failover works, thanks for the simple but great explanation!
Hi Juan
I don't understand very well why ERS is necessary in a cluster. If ASCS and DB are running on node1, when node1 fails ASCS and DB resources will be moved to node2, and in a few minutes the system will be available. Why do you need the ERS, so as not to lose the lock table data?. Thanks a lot. Javier
Hi Javier,
Yes, ERS maintains a copy of the enque lock table so when the failover happen the system continue operations without issues,
Regards, JP
Hello Juan, and others.
I have installed S/4HANA HA on SUSE Linux, with SAP HANA 2.0 HA, also on SUSE Linux. I have PAS and AAS, using SAP GUI and Logon Group, we get to have HA on our system. Problem is, we need user to access to Fiori directly (using an URL). I have been reading and looking and I can't find a document or some blog, that will help me configure that in a way that user do not have to connect to a different URL in case of failure. We are also using SSO with SAML2.
Any ideas are more than welcome!
Regards
Hi Mauricio,
For Fiori load balancing (or any other Java or html based system) you need to use a web dispatcher, web dispatcher can usually be installed either integrated with the SCS instance or standalone.
Regards, JP
Thank you Juan for your reply.
I actually have a web dispatcher as part of the ASCS instance. The thing is, I need users to use just one singe URL to connect to Fiori using SSO. The way it is configured now, we have two URLs - one for each app server.
SSO actually returns an error that the server that sends the request to the ADFS, is different than the one that receives it, so we can't authenticate by SSO.
I hope I was able to explain myself.
Regards
I think you should open a question on the SAP Basis forum as this is unrelated to ASCS/ERS HA
Will do just that.
Thank you!
Hello Juan. I want to post what I did. It may help other people.
We configured SAP Web Dispatcher as part of the ASCS instance. Most config was already in place after installation. I just added/modified some parameters to meet our requirements.
Then we configured Fiori Launchpad on t-code SICF to use the logon group.
For SSO, we set up SAML2 using our SAP Web Dispatcher as the hostname (t-code SAML2). From there, we exported metadata to the IdP, and configured SAML2 as needed. The way we did it before, directly connected to one the app servers, created metadata using just THAT app server as hostname, which was not the correct way to do it.
Finally, we configured Fiori Launchpad (t-code SICF), so that it would use the "Alternative Logon Procedure" - SAML Logon, and it worked.
If one of our app servers is down, we can always reach the Launchpad.
Thank you for your comments.
Regards.
Hi Juan,
Really a wonderful blog !
Thanks a lot for explaining the ERS HA setup in such simple format .
Regards,
Premkishan Chourasia
Thanks!
Hello Juan,
It's really great blog.
Thank you for your effort.
I have a question.
We have the same HA system configuration as the scenario you described above.
SAP Application : NW 7.5 Enterprise Portal with HANA DB
Node 1 : SCS01 - ASCS (ERS Standby)
Node 2 : ERS11 - ERS (ASCS Standby)
Node 3 : J00 - PAS
Node 4 : J00 - AAS
Of course, I know the ERS system is not a SPoF in the SAP configuration system.
Therefore, even if the ERS Server goes down, the entire SAP system should not be affected at all.
Unfortunately, only the ERS server was down for a while in our system recently,
but for about 6 minutes (Probably the time the ERS server rebooted) , the system connection was not established normally.
We reproduced the same situation as at that time to determine the cause,
but it did not adversely affect the system.
Could it be that ERS is down and unlucky causes an other problems connecting to the system?
Or have you ever had such a similar experience?
Thanks and regards,
CY Choi
Hi CY,
Maybe the root cause of your issue was not the ERS server but some of the mounted filesystems, if the shared filesystems are affected it may impact both ASCS and ERS. If the ERS was running on a dedicated server and it went down is unlikely to impact the system availability.
Regards, JP
Thanks Juan Reyes...
* Do they need a separate cluster set up?
Please clarify.
Hi Daniel,
This process is specific to ASCS Instance....
Read High Availability Explained
Thanks Juan..
I am just trying to understand why ERS is so important though it does not provide a complete HA set up for the whole system.
PAS can't reply on the Additional app server every time as a standby ( resources may differ)
DB needs a separate HA again.
Can HA be achieved in one go for all these instances?
Are these private clouds AWS and Azure providing the HA/DR feature by default during the EC2/VM build itself?
Please clarify
Thanks in advance
Again to understand SAP HA you need to understand its components.... ASCS, AS and DB they have their own HA demands.
"Can HA be achieved in one go for all these instances?"
No, that is not the way it works
"Are these private clouds AWS and Azure providing the HA/DR feature by default during the EC2/VM build itself?"
No, they don't, Cloud providers have their own HA offerings but they all comply with above logic. If IAAS you are still in charge of designing your own HA setup, if PAAS where HA is a feature they will still have to comply with above logic even tho end user does not have hands on it.
Regards, JP
Thanks a lot for clarifying.
Hello,Juan Reyes
Thank you for your shared this blog .i have the similar architecture there are four node 。SCS ,ERS PAS and AAS .there are four machine 。after i complete the cluster configure 。the cluster can switch ,if i have running the PAS and AAS,the cluster is Switch too slow about need 10 minutes. but the PAS and AAS is stopped,then i test the cluster is switch fast about 1minutes,can you help me analysis the root cause。thanks
Best Regards,
hayden.liu
Post your cluster configuration (hide the hostname/ip). Cluster switch depends on the cluster parameters like (op monitor, start, stop, and fencing) so most probably you need to optimize those parameters for your config.