We are designing our new SAP Infrastructure and are in awe of all the new technologies that are available. CPU’s are significantly faster, storage is less of a bottleneck than ever, virtualization options are very appealing, and networks are seamless. Unfortunately, server clustering technology is still a kludge and requires a significant amount of scripting and hardcoding.
In my dream world, there would be a bottle of Caymus Cabernet Savignon on every table and every data center would contain a pool of server resources – some active, some not, and in the event of a failure, the clustering application would automatically find a new home for the failed server and start the failed server on this new home.
At a high-level, all of the clustering products claim to do what I am requesting. Unfortunately, when you dig into the details, you find that much of the clustering has to be pre-determined – in other words, if you want Server A to be able to failover to Server B, Server B must be primed ahead of time. This becomes a logistical nightmare and requires a much larger server footprint than necessary. Here is an example, assume that you have two servers named Server A and Server B. Server A contains 10 virtual hosts – 1 host is an ERP database and CI, 1 host is a Portal database and CI, and the other 8 hosts contain dialog instances for the ERP and Portal systems. Now assume that Server A crashes. In order for Server B to take over for Server A, it would have to be pre-configured with 10 virtual hosts that look very similar to the 10 virtual hosts that were on Server A at the time of the failure. This creates a very complex relationship of primary-to-failover servers.
Challenge to Vendors
Please, please make your clustering software smarter! For example, why not design the software to act like this when a server crash occurs – “omg, Server A failed, let me find resources on Server B and dynamically create 10 virtual hosts and rezone the shared storage so that I can run Server A’s virtual hosts on Server B”.
Am I asking for too much here? How are other large companies dealing with this? Do you include your dialog instances in your clustering design? Do you pre-install your dialog instances on your failover servers and only activate them if needed?