Skip to Content

Fail-Safe Operation of SAP HANA®: SUSE Extends Its High-Availability Solution

“SAP customers invest in SAP HANA” is the conclusion reached by a recent market study carried out by Pierre Audoin Consultants (PAC). In Germany alone, half of companies expect SAP HANA to become the dominant database platform in the SAP environment. In many cases, the “SAP Business Suite® powered by SAP HANA” scenario is already being discussed in concrete terms.

Naturally, SUSE is also accommodating this development by providing SUSE Linux Enterprise Server for SAP Applications – the recommended and supported operating system for SAP HANA. In close collaboration with SAP and hardware partners, therefore, SUSE will provide two resource agents for customers to ensure the high availability of SAP HANA system replications.SAPHana-SR.png

Two Replication Scenarios

The current initial phase of the project includes the architecture and development of scale-up scenarios, which will be tested together with SAP in the coming weeks. System replication will help to replicate the database data from one computer to another computer in order to compensate for database failures (single-box replication). This is to be followed by a second project phase involving an extension for scale-out scenarios (multibox replication). With this mode of operation, internal SAP HANA high-availability (HA) mechanisms and the resource agent must work together or be coordinated with each other.

SUSE implements these scenarios with the SAPHana resource agent (RA), which performs the actual check of the SAP HANA database instances and is configured as a master/slave resource. In a scale-up scenario, the master assumes responsibility for the SAP HANA databases running in primary mode, and the slave is responsible for instances that are operated in synchronous (secondary) status.

To make configuring the cluster as simple as possible, SUSE also developed its SAPHanaTopology resource agent. This runs on all nodes of an SLE 11 HAE cluster and gathers information about the statuses and configurations of SAP HANA system replications. It was designed as a normal (stateless) clone.

Customers Receive Complete Package

With both the SAPHana and SAPHanaTopology resource agents, customers will therefore be able to integrate SAP HANA system replications in their cluster. This has the advantage of enabling companies to use not only their business-critical SAP systems but also their SAP HANA databases without interruption while noticeably reducing their budgets. SUSE provides the extended solution together with best practices documentation.

SAP and hardware partners who do not have their own SAP HANA high-availability solution will also benefit from this new SUSE Linux development.

Please also read our setup-guide and my SCN document

You must be Logged on to comment or reply to a post.
  • Hi Fabian,

    as good as any post from you ... I remember our HA workshop some time ago .. this is the next level 😉

    Greetings Paul

    • Hi Paul,

      thanks for your kind feedback 🙂 . In one of my next blogs I will explain a bit about the parameters of my resource agents.

      Greeting Fabian

    • Hi Tomas Krojzl, Fabian Herschel,

      I'm not an infrastructure/OS expert, so I'd like to understand from you guys: does this document that Tomas has linked above mean that SUSE HAE can automate the takeover of the secondary node in case the primary fails (similar to what IBM's HACMP does for SAP applications e.g. ERP)?

      So far, System Replication requires a manual activity through HANA Studio for the secondary node to be activated in the case the Primary fails. I'd like to understand whether this new white paper is about automating this takeover, or if it's more about monitoring & STONITH (i.e. powering down the failed node).

      Thank you very much,


      • Hi Henrique,

        yes exactly. SUSE provides two resource agents which organizes the takeover when

        a primary HANA instance fails. So the synced secondary will get the new primary. You can even activate "AUTOMATED_REGISTER". In this case the former (failed) primary can be automatically registered to the new primary. This enables the system replication automatically after a successful takeover.

        Our solution automates the "manual" activity so it increases the high availability of your synchronized databases.

        Hope that answers your question.



        • Hello,

          yes - I think that this is extremely important stuff - I believe link to this document should be included in all related SAP materials - in particular following:

          1999880 - FAQ: SAP HANA System Replication (I think here should be dedicated question to this automation)

          How to Perform System Replication for SAP HANA (here is link to this blog but not to the document itself)

          1953429 - SAP HANA and SAP NetWeaver AS ABAP on one Server (this note contains attachment "HowTo_Inst_HANA_ABAP_Dec2013.pdf" which is also related to this subject - link should be included here as well)...

          ...and maybe others...

          SAP Note 1953429 is covering ABAP on HANA appliance - it might be interesting to develop dedicated "cookbook" covering this scenario as well (I am aware that individual bits and pieces are already around... but not in one document)..


        • Thank you very much, Fabian & Thomas.

          It's indeed a watershed that needs to be released asap.

          I've seen the first release of HAE for HANA is only for 1-to-1 scale-up replications, and without the possibility to share the infra with other environments (e.g. QAS).

          Any plans on when the support for Scale Out and the possibility to have scripts that would allow stopping QAS & starting PRD-secondary would be provided?



  • Hello, Assuming that these are GA now, where can i obtain the resource agents? Would appreciate if someone can share the download link, I found an older version of the code in github. hoping to get the GA code.



    • Hello,

      you need to download the agents from SUSE download pages - you need to go to this page:

      at the very bottom you will see resource agents - latest version I can see today is "02 Feb 2015  - SAPHanaSR 10162" which contains following files:

         license_agreement.txt 2.8 KB (2909)
         SAPHanaSR-0.149-0.8.1.noarch.rpm 34.5 KB (35340)
         SAPHanaSR-doc-0.149-0.8.1.noarch.rpm 481.9 KB (493564)
         readme_5200250.html 5.7 KB (5935)

      Since it is restricted you might need to deal with SUSE to allow you to download the resource.


      • Hello,

        ...and one more comment - when installing you should follow the procedure described here:

        not sure if useful but here are technical prerequisites for the packages:

        server-xyz # rpm -qRp SAPHanaSR-0.149-0.8.1.noarch.rpm

        pacemaker > 1.1.1


        rpmlib(PayloadFilesHavePrefix) <= 4.0-1

        rpmlib(CompressedFileNames) <= 3.0.4-1



        rpmlib(PayloadIsLzma) <= 4.4.6-1

        server-xyz # rpm -qRp SAPHanaSR-doc-0.149-0.8.1.noarch.rpm

        rpmlib(PayloadFilesHavePrefix) <= 4.0-1

        rpmlib(CompressedFileNames) <= 3.0.4-1

        rpmlib(PayloadIsLzma) <= 4.4.6-1

        server-xyz # rpm -qRp resource-agents-3.9.5-0.28.7.x86_64.rpm

        rpmlib(PayloadFilesHavePrefix) <= 4.0-1

        rpmlib(CompressedFileNames) <= 3.0.4-1



        rpmlib(PayloadIsLzma) <= 4.4.6-1

        So you you need resource agents package and pacemaker (+ other packages) to be installed before this...


      • Hello,

        ...and one more update... 😆

        Resource agents versions located in packages mentioned above (which I would consider GA version are following):

        server-xyz # grep "<version>" SAPHana*

        SAPHana:                    <version>0.149.4</version>

        SAPHanaTopology:     <version>0.149.3</version>

        while there is more up-to-date version (which I would consider as latest development build) available here:

        at the time of this comment the versions were following:

        server-xyz # grep "<version>" SAPHana*

        SAPHana:                    <version>0.149.4</version>

        SAPHanaTopology:     <version>0.149.4</version>

        So I would not say that github is having older versions... 😉


        • Yes the github repository is the upstream project and should always have newer versions of the resource agents. However these versions are not ready for productive systems, because they are not tested. It's just the repository to exchange update/merge requests between the persons contributing. So please, please do NOT use the github versions - always use the official released versions.

    • The resource agents must be fetched from the SUSE official update channels. If you have a valid SLES for SAP Applications registration you can just get the resource agents by using zypper. The resource agents and all calculated prerequisites are fetched by zypper from your update source. This is either the SUSE customer center (SCC) or your local mirror like SMT.

      • Hi @Stefan Schiele

        Thanks for the info! Much appreciate.

        Couldn't access the suse note with id....

        May i know how SAPHANASR works exactly with MDC? Correct me if i'm wrong, if any tenant or its indexserver failed, entire HDB (together with all tenants) will take over to secondary by suse cluster?

        Hope to hear from you soon.



        Nicholas Chang

        • Hi Nicholas,

          saphanasr checks with 'hana native' methods the system-replication status and makes a take-over decision.

          The exact decision 'matrix' i think Fabian Herschel will know in perfection 😉

          I think for the SAPHANASR process there is not a big difference if it's MDC or single.

          If just one indexserver of one tenant-db fails, deamon and nameserver will try to restart it first - if this doesn't work, it will have impact on the SR-status, which then will be considered by SAPHANASR.

          In general, a local restart can be tried first. For single service failures, HANA deamon and namesarver control that. And for whole HANA instance, SAPHANASR can take care about (when I see the code correctly, this is the default value.)

          And yes, of course only the whole instance with all tenant DBs can be replicated and switched to secondary site.



          • Hi Stefan,

            Thanks for your reply again! Much appreciate.

            i'm awared of how SAPHANASR works as we have it installed and running on our system 😉

            just want to know the matrix for  SAPHANASR on MDC, any special way for dealing it.

            as we know for MDC, there are multiple tenants running inside one HDB, and Hana system replication happens for the whole system (all tenants)

            i assume if one of the tenants failed, SAPHANASR will detect it and fail over the whole system (all tenants + failed tenant) to the slave without any special crm coding?


            NIcholas Chang

          • Hi Nicholas,

            yes SAPHanaSR always takes the "over-all" status of the system replication which consists of the

            status of all system replication services. And there are one or more such services per tenant database. This means if one tenant service or the system-database service would be broken SAPHANASR would need to mark the secondary as not in sync (SFAIL) and to exclude the secndary for takeover till it is in sync again.

            Hope that helps


          • Hi Fabian Herschel

            Thanks for you reply. Very much appreciate!

            I did a test using saphanasr 0.151 and HDB rev112.2 with MDC.

            in my scenario, i have one systemdb, two tenants (eg: AB1 and AB2)

            However, SUSE Cluster manager is not automate secondary take over when i

            i) stop tenant AB1

            ii) stop tenant AB2

            iii) stop both tenant AB1 and AB2

            Slave/ secondary only take over, automated by SUSE Cluster when SYSTEMDB is down.

            I reckon take over should be initiated by SUSE Cluster if it detects any of the tenants is down.

            Appreciate if you can shed some light on this.


            Nicholas Chang

          • The solution should never takeover because of only a tennant is down. The solution must ask the landscapeHostCobfiguration and this is still OK if you stop tenants. lets think about 20 tenants inside one system. One tenant gets stopped and the solution would also interrupt 19 other tenants because of a takeover. This is not the intention of the MDC support.

            The MDC support seaa the RDMS always as a complete thing. If you stop a tenant its your selcection. I have also heared that such stops are also synchronized to the secondary site so the tenant would also be stopped on the secondary.

            If you stop the SYSTEMDB than the landscapeHostConfiguration will change to stop and that's the interface between SAP and the cluster to decide when a takeover might processed.

          • Hi Fabian Herschel

            Thanks for the explanation on how suse cluster saphanasr works on MDC.

            Ok, what about there are 20 tenants in the database, and 10 out of 20 tenants down due to some internal error, and not stopped intentionally. Take over will not initiate as long as the SYSTEMDB is still up and status from landscapeHostCobfiguration still show ok? Please correct me if i'm wrong. The take over only happen when systemdb is down?

            Just want to make it clear.


            Nicholas Chang

          • We could only trust the landscapeHostConfiguration interface. If the landscape says that the SAP HANA site is ok or in internal processing (WARNING) than we could not takeover.

            If you really have a case where tenants are broken and the landscape would be still OK or INFO or  stay on WARNING for a long time you would need to discuss this with SAP.

            Than they could change landsacpeHostConfiguration to detect failed-but-not-stopped-tenants and could decide if the landscape is still OK, INFO or only WARNING.

    • MDC is supported. You need to use a SAP HANA > rev900 and a RA >= 0.151.

      The guides are currently in a rewrite process.



  • Hi Fabian,

    Now I'm configuring fail-over HANA with SAPHanaSR 0.151-0.11.1.

    Somehow, resource for HANA startup fails on one node. Slave node can start, but Master node fails. Resource monitor is 'not running' (7).

    What should be checked?

    (sorry for too tech details...)


    • Typically the primary HANA will always be started first. It's not easy to answer what is the root cause or what you should check, if we really have nothing about your configuration.

      I f this is a productive setup (or planned to get a productive setup) please open a ticket at either SUSE or RedHat (as I don't even have the information about the used linux distribution).

      • This blog is about SUSE HAE, isn't it? My environment is on SUSE for test purpose.

        I'm using SLES for SAP 12SP1 and HANA rev. 120. Now, status of HANA resources are "Started" and "FAILED". It seems the failed node is re-trying to start up (blinking on Hawk).

        Monitoring is not running on either node, rsc_SAPHana_SID_HDB00_monitor_61000 on'not running' (7): call=816, status=complete, exitreason='none'.

        I configured like below (part);

        primitive rsc_SAPHanaTopology_SID_HDB00 ocf:suse:SAPHanaTopology \

                 operations $id=rsc_sap2_SID_HDB00-operations \

                 op monitor interval=10 timeout=600 \

                 op start interval=0 timeout=600 \

                 op stop interval=0 timeout=300 \

                 params SID=SID InstanceNumber=00

        primitive rsc_SAPHana_SID_HDB00 ocf:suse:SAPHana \

                 operations $id=rsc_sap_SID_HDB00-operations \

                 op start interval=0 timeout=3600 \

                 op stop interval=0 timeout=3600 \

                 op promote interval=0 timeout=3600 \

                 op monitor interval=60 role=Master timeout=700 \

                 op monitor interval=61 role=Slave timeout=700 \


        ms msl_SAPHana_SID_HDB00 rsc_SAPHana_SID_HDB00 \

                 meta is-managed=true notify=true clone-max=2 clone-node-max=1 target-role=Stopped interleave=true clone cln_SAPHanaTopology_SID_HDB00 rsc_SAPHanaTopology_SID_HDB00 \

        colocation col_IP_Primary 2000: res_IP:Started msl_SAPHana_SID_HDB00:Master

        order ord_SAPHana 2000: cln_SAPHanaTopology_SID_HDB00 msl_SAPHana_SID_HDB00

        • Sure the blog is about SUSE HAE or better about SLES for SAP Applications. However this does not say that all questions are really related to SUSE Linux Enterprise Server. So my intention was just to make that more clear.

          Ok if you are using SLES for SAP 12 SP1 you might need to wait for our next update. We fixed an python incompatability problem betweem SLES for SAP 12 SP1 and SAP HANAs python.

          Second with SPS12 (rev 120) SAP changed an interface and added a new field in an output

          format so we needed to changed the parser. Now we switched to a more stable output format.

          If you need the new package quite early, please open either a SAP ticket at the SUSE queue or a Service Request directly at SUSE to receive a so called PTF (just to avoid waiting for the release in the update channel).

          Beside that ... The other thing (maybe result of tries to start/stop) via cmdline and/or HAWK WEB-INTERFACE is that you still have target-role=Stopped which prevents the cluster to start the resoure once it would work correctly. So that's not the reason, but after the package update you should change that (just as a reminder).

  • Hi Fabian,

    I'm facing the same problem as Former Member.

    I have SUSE for Sap Applications 12 SP 1 with SAP HANA SPS 11.

    I need to set-up this HA in a customer and I'm having message failed when try to start the cluster service.

    As you said, I need the  package asap, do you know how can I have it asap?

    Many thanks, see you