We received various reports on an issue with CAL instances which seem to have trouble when the suspend/resume operation in CAL has been executed. After the resume it is no longer possible to connect to the SAP system although the CAL UI shows a green traffic light. The issue currently only appears when you use Amazon Web Services as cloud provider for your CAL solution and it is not reproducible. It only appears sporadic in case you wonder why your suspend/resume operations still work. Although only a fraction of you might be affected I would like to let you know that we are working with high pressure on a solution for it but until then you would need to use this workaround:
We identified that the AWS Meta Data Service seems to be a little bit shaky in the last days. If this service is not available at the time of reboot (which in rare cases seem to be the case lately), mandatory information required for the runtime of the SAP system cannot be determined. To circumvent the issue you would need to ssh into the instance that does not seem to work properly after the suspend/resume operation.
Once you established the connection, please add a sleep 120 in the beginning of the script /sbin/updatehosts-network.sh to allow the AWS Meta Data Service more time to start up.
To reactivate your broken instance, execute as user root:
- /etc/init.d/updatehosts-network stop
- /etc/init.d/updatehosts-network start
Afterwards as sidadm execute “stopsap && startsap”. Then the SAP system should be up and running again and you are able to execute further suspend/resume operations from the CAL UI.
I will let you know once we fixed the issue globally. I hope it does not boil down to add a generic sleep 120 in all our appliances. This takes away 2 minutes from your precious time and we should avoid that from happening, especially in the cloud 😉