High Availability Explained – v1.1
There are numerous blogs about SAP running High Availability but none of them seem to focus on explaining how the basics work.
What are the Basics?
To explain this in simple terms we should start from a standard Standalone SAP system. A standard system has 3 basic components, each of them is critical for the system to work:
1. ASCS -> stands for ABAP SAP Central Service and it’s made of two parts, the Message Server and the Enqueue Server. The Message Server acts as communication channel between the application servers and handle the load distribution, the Enqueue Server controls the lock mechanism.
2. AS -> Applications servers. In the old days you had a Central instance which included the ASCS component, now the ASCS component has been removed and stands on its own, hence the first application server is called the PAS (Primary Application Server) and the ones after that are called AAS’s (Additional Application Servers) but in practice there is very little difference in between them.
3. Database -> The database is simply the database, your primary persistence and where you store your data.
Your traditional standalone system looks something like this:
So, what do you have to do to make your system highly available?… very simple, you have to give each of these components redundancy, covering all the single points of failure.
How do you achieve that?
Well, for the basic HA system you need at least two hosts (or nodes) to fit the required components, also ideally they will be located in separate data centers.
ASCS -> For the central services the recommended procedure is to use ERS (Enqueue replication server). ASCS and ERS are installed on a shared disk in both hosts. Enqueue server will keep the lock table and ERS will keep a replicated copy of the lock table. A third party cluster software will provide an automatic failover mechanism for the ASCS instance. Now that the jibber-jabber is out of the way, what this means is that you have an ASCS and an ERS on each host so if at any moment an issue was to hit one of the nodes it would automatically failover to the other one keeping the system alive.
AS -> From the application server point of view the key is the numbers of them; at a basic level you need at least 2 application servers (PAS and AAS) using load balancing. If an issue hit one of the nodes all users connected to that node would be kicked out but users will be able to logon again as the other AS will be up and running.
Database -> DB wise the norm is to have at least 2 databases where one is set to be the primary database, serving the system and the second one is a standby database which is supplied with a constant feed of logs from the primary database (this is called log shipping). On top of that you need a cluster software and an automated fail-over mechanism. This means that the cluster will be pointing to the primary database, if that node becomes unavailable the failover mechanism will kick in and the standby database will become the primary database effectively keeping the system running.
Once finished your new HA system should look something like this,
Now, What is a HA-Cluster?
At the most basic level a standard HA-Cluster in an Active-Passive configuration has 2 nodes, one is the primary node and the other one is the standby node. That simply means that the primary node is actively serving the system while the standby node is waiting to jump-in in case of a failure.
How does it work?
The cluster get set with a virtual IP (and hostname via DNS), these are the details to be used on the SAP profiles to call that particular component. The Cluster will assign that virtual IP to the active node and use a heartbeat monitor to confirm the availability of the components, if the primary node stops responding it will trigger the automatic failover mechanism that will call the standby node to step-up to become the primary node and will assign the virtual IP to it restoring the component availability. Once the failed node is fixed it will come online as a standby node.
A High Availability database will look something like this:
You can read more about this subject on High Availability – Frequently Asked Questions by Eyal Katz
So? is that it?…
Yes, and no….
Yes, you have a HA system now!! … if there is a problem your system will be able to withstand the loss of one of its components or even an entire host.
No, you need to make sure everything that talks to this system is using load balancing. I cannot stress this enough, if the rest of the landscape and third parties are pointing at a fix host you will end up out of business… so, it’s very important to make sure that all the RFC’s, JCO’s and GUI’s are set to take full benefit of load balancing.
Hope this was a good overview of a basic HA system.
PS: This is a very basic case scenario, if you use ICM or the SAP Gateway you want to make sure you use an SAP webdispatcher and a standalone SAP Gateway to load balance those requests too.
PPS: HA differs drastically depending what OS and Database you run your landscape on, also there are several tools available for virtualised environments.
Love to hear your comments,
SAP HANA High Availability explained
Nice! Just one small correction .
The Message Server does not handle requests to a work process...
The Message Server acts as the load balancer for SAP GUI and RFC connections.
It also provides system information to the Web Dispatcher, which then can perform HTTP load balancing (OK, the Message Server itself can do HTTP load balancing, but it is very limited / simpler if compared to the Web Dispatcher).
In addition, it is also used by the Dispatcher process of each instance (PAS and AASs) so the instances can communicate with each other.
I stand corrected, blog updated, cheers..
Updated the blog to add a definition and explanation of a HA-cluster.
Very Nice Blog Juan as always
Excellent Blog Juan.
You always inspires us in SAP BASIS world ..!!
Thank you !
very good explanation on HA!!!!!!!!!!!!!!!!!!!!
Appreciable explanation on HA!!!!
I have doubt:-
Host A–> Enq server; Msg Server & PAS
Host B–> ERS & AAS
User1 working on Host A and this host get shutdown, then if User1 log in Host B would they able to access same screen ?
Unfortunately once an application server goes down the sessions are lost and so any uncommitted transactions.
The application locks are replicated to ERS. In your case if HOST A goes down and when the Enq
server and Msg Server services are relocated to Host B then you should be able to access the
same screen where the lock was already placed on HOST A.
That is not the case, sessions get terminated once application server goes down. There is no automatic failover for application servers.
Very nice, Thanks!
Most posts on this subject are terrible. Thank you for writing this in human-readable words.
Nice blog. Great!
Nice one !
Really Helpful Blog on HA...Sooper.
Great Juan Reyes. Thanks for sharing Knowledge.
Nice.. Thanks for sharing knowledge Juan Reyes
Simple and clear explanation , thanks a lot
Dear Juan Reyes,
Your blog is really awesome, Thanks for sharing you knowledge.
Here i have one small doubt,
When we are installing ASCS on cluster package/resource by using VH(Virtual host) on NODE -A, what is the need of installing same instance (ASCS) on NODE B????
In case NODE A fails, cluster packge/resources will get moved on to NODE B along with VH, all the ASCS services will available to the user, though they are switched on to NODE B(But only thing is that user active session will get terminated when ever switch/fail over occurs).
When all user requests are bounded to NODE -A ASCS all the time, then what is the use of NODE-B ASCS.
Will you please clarify my doubt.
Thanks in advance.
We have done this setup but now we have an issue in the monitoring setup.
when running Managed system Setup in Solman we only se one node for the central instance and in LMDB we have added the virtuel node for SCS
how do we tell Solman that the SCS instance is clustered?
I think so, its the same scenario as when you have a clustered database
Quick question: In the SAP ABAP installation guide for HA in Unix, it says to install ERS with virtual host name. I notice the same thing in sapinst initial screen as well. If I understand correctly, ERS will maintain a copy of the lock table held by enqueue which is part of ASCS. Since, ASCS is installed on a shared disk, ERS should ideally be installed on both cluster pair using host name (not virtual host name) and into local file system rather than a shared file system.
This is How i did in Windows. Let me know if there is any difference between Windows and Unix installations?
I'm confused about the ERS.
In my understanding a ERS connects to the enque server and replicates the enque table. But it is not running on Host A and B at the same time - right?
So, what happens to the ERS in case of a fail-over?
If Host A fails and ASCS will be taken-over to Host B, stays the ERS active? should it be shutdown and taken-over to Host A, if that's available again?
appreciate your response
No Enq Server won't be running when there is a failover of in the ASCS. Only the msg server component will be started on the Enq Replicator Server Host.
Could not get better and easier explanation on HA than this. Very helpful for the beginner who are learning these concepts. Thanks a lot Juan Reyes.
I have experienced all the same as you have described above in practical. I have one query that in a distributed scenario where (PAS + ASCS) is on single host and Database on another Single host. Do I need to install ERS server on (PAS + ASCS) host. There is no High Availability.
No, you don't...
I really enjoyed your blog. We are trying to implement HA Cluster for application servers for our already running environment. Most documentation I found is related to installation from scratch. In an existing environment, do you have to uninstall the ASCS and ERS during a downtime before installing them on the cluster? Or is there some way to move the ASCS and ERS from dedicated host to the new HA cluster hosts?
Check my ASCS HA using ERS blog
Because of the way the filesystems are mounted from a shared FS is probably better to reinstall it rather to try to adapt the existing one. ASCS/ERS installation is quick and painless and once you have got the grip of it (config wise) you should only require a small downtime to do the swap.
Thank you for the quick response, Juan! I am checking your other blog now!