Fail-Safe Operation of SAP HANA®: SUSE Extends Its High-Availability Solution
“SAP customers invest in SAP HANA” is the conclusion reached by a recent market study carried out by Pierre Audoin Consultants (PAC). In Germany alone, half of companies expect SAP HANA to become the dominant database platform in the SAP environment. In many cases, the “SAP Business Suite® powered by SAP HANA” scenario is already being discussed in concrete terms.
Naturally, SUSE is also accommodating this development by providing SUSE Linux Enterprise Server for SAP Applications – the recommended and supported operating system for SAP HANA. In close collaboration with SAP and hardware partners, therefore, SUSE will provide two resource agents for customers to ensure the high availability of SAP HANA system replications.
Two Replication Scenarios
The current initial phase of the project includes the architecture and development of scale-up scenarios, which will be tested together with SAP in the coming weeks. System replication will help to replicate the database data from one computer to another computer in order to compensate for database failures (single-box replication). This is to be followed by a second project phase involving an extension for scale-out scenarios (multibox replication). With this mode of operation, internal SAP HANA high-availability (HA) mechanisms and the resource agent must work together or be coordinated with each other.
SUSE implements these scenarios with the SAPHana resource agent (RA), which performs the actual check of the SAP HANA database instances and is configured as a master/slave resource. In a scale-up scenario, the master assumes responsibility for the SAP HANA databases running in primary mode, and the slave is responsible for instances that are operated in synchronous (secondary) status.
To make configuring the cluster as simple as possible, SUSE also developed its SAPHanaTopology resource agent. This runs on all nodes of an SLE 11 HAE cluster and gathers information about the statuses and configurations of SAP HANA system replications. It was designed as a normal (stateless) clone.
Customers Receive Complete Package
With both the SAPHana and SAPHanaTopology resource agents, customers will therefore be able to integrate SAP HANA system replications in their cluster. This has the advantage of enabling companies to use not only their business-critical SAP systems but also their SAP HANA databases without interruption while noticeably reducing their budgets. SUSE provides the extended solution together with best practices documentation.
SAP and hardware partners who do not have their own SAP HANA high-availability solution will also benefit from this new SUSE Linux development.
Please also read our setup-guide https://www.suse.com/products/sles-for-sap/resource-library/sap-best-practices.html and my SCN document http://scn.sap.com/docs/DOC-56278.
Hi Fabian
Great post. Is there any indications of when this would be generally available?
Regards
Derek
Hi Derek,
currently the resource agents are still under development and testing. Let's expect them to be (general) available in the next months. So far the tests are running really good 🙂 .
Regards
Fabian
... so we should "plan" this for SPS 9?
Kind regards, Rudi
The current tests are running with SAP HANA SPS7 and will also be repeated with SAP HANA SPS8. The release of the SUSE resource agents will not be triggered by a SAP HANA release date, but by a SUSE announcement which we plan in the next few months.
I think what they are looking for is a high level planned release date, as in Q3/14, Q4/14, 2015, etc.
Thanks Henrique. Than "next few months" means Q3/2014 where this is Q3 of the calendar year not any shifted fiscal year 😉
Great. 😉
a GA date would be great! 🙂
This looks like great tech but this belongs in a press release not a SCN blog in my opinion!
Thanks 🙂
... maybe Fabian was a bit too fast ... 😉
No it was intended to blog about that tech here and to get questions, feedback and so on. And of course we (SUSE) will also use an additional platform to made this even more public and also to announce a GA date.
As the question came up gain: "in the next few months" means something like Q3/2014.
Hi Fabian,
as good as any post from you ... I remember our HA workshop some time ago .. this is the next level 😉
Greetings Paul
Hi Paul,
thanks for your kind feedback 🙂 . In one of my next blogs I will explain a bit about the parameters of my resource agents.
Greeting Fabian
Hello,
adding link to the related material:
Automate HANA System Replication with SUSE HA
https://www.suse.com/promo/saphana-replication.html
Tomas
This is great, thanks!
Hi Tomas Krojzl, Fabian Herschel,
I'm not an infrastructure/OS expert, so I'd like to understand from you guys: does this document that Tomas has linked above mean that SUSE HAE can automate the takeover of the secondary node in case the primary fails (similar to what IBM's HACMP does for SAP applications e.g. ERP)?
So far, System Replication requires a manual activity through HANA Studio for the secondary node to be activated in the case the Primary fails. I'd like to understand whether this new white paper is about automating this takeover, or if it's more about monitoring & STONITH (i.e. powering down the failed node).
Thank you very much,
Henrique.
Hi Henrique,
yes exactly. SUSE provides two resource agents which organizes the takeover when
a primary HANA instance fails. So the synced secondary will get the new primary. You can even activate "AUTOMATED_REGISTER". In this case the former (failed) primary can be automatically registered to the new primary. This enables the system replication automatically after a successful takeover.
Our solution automates the "manual" activity so it increases the high availability of your synchronized databases.
Hope that answers your question.
Regards
Fabian
Hello,
yes - I think that this is extremely important stuff - I believe link to this document should be included in all related SAP materials - in particular following:
1999880 - FAQ: SAP HANA System Replication (I think here should be dedicated question to this automation)
How to Perform System Replication for SAP HANA (here is link to this blog but not to the document itself)
1953429 - SAP HANA and SAP NetWeaver AS ABAP on one Server (this note contains attachment "HowTo_Inst_HANA_ABAP_Dec2013.pdf" which is also related to this subject - link should be included here as well)...
...and maybe others...
SAP Note 1953429 is covering ABAP on HANA appliance - it might be interesting to develop dedicated "cookbook" covering this scenario as well (I am aware that individual bits and pieces are already around... but not in one document)..
Tomas
Cc Ralf Czekalla
Thank you very much, Fabian & Thomas.
It's indeed a watershed that needs to be released asap.
I've seen the first release of HAE for HANA is only for 1-to-1 scale-up replications, and without the possibility to share the infra with other environments (e.g. QAS).
Any plans on when the support for Scale Out and the possibility to have scripts that would allow stopping QAS & starting PRD-secondary would be provided?
Cheers,
Henrique.
The page is at: Best Practices - Resource Library | SAP Applications | SUSE
Hello, Assuming that these are GA now, where can i obtain the resource agents? Would appreciate if someone can share the download link, I found an older version of the code in github. hoping to get the GA code.
Thanks
Hari
Hello,
you need to download the agents from SUSE download pages - you need to go to this page: https://download.suse.com/patch/finder/#bu=suse&familyId=7261&productId=42047
at the very bottom you will see resource agents - latest version I can see today is "02 Feb 2015 - SAPHanaSR 10162" which contains following files:
Since it is restricted you might need to deal with SUSE to allow you to download the resource.
Tomas
Hello,
Here is direct link to the patch itself: https://download.suse.com/Download?buildid=WXjmbMLddBo~
Packages could also be located here: https://nu.novell.com/repo/$RCE/SLE11-SP3-SAP-Updates/sle-11-x86_64/rpm/noarch
Tomas
Hello,
...and one more comment - when installing you should follow the procedure described here: https://www.suse.com/promo/saphana-replication.html?src=hana-ha
not sure if useful but here are technical prerequisites for the packages:
server-xyz # rpm -qRp SAPHanaSR-0.149-0.8.1.noarch.rpm
pacemaker > 1.1.1
resource-agents
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
/bin/bash
/usr/bin/perl
rpmlib(PayloadIsLzma) <= 4.4.6-1
server-xyz # rpm -qRp SAPHanaSR-doc-0.149-0.8.1.noarch.rpm
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadIsLzma) <= 4.4.6-1
server-xyz # rpm -qRp resource-agents-3.9.5-0.28.7.x86_64.rpm
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
/bin/bash
/bin/sh
libc.so.6()(64bit)
libc.so.6(GLIBC_2.2.5)(64bit)
libc.so.6(GLIBC_2.3)(64bit)
libc.so.6(GLIBC_2.3.4)(64bit)
libc.so.6(GLIBC_2.4)(64bit)
libglib-2.0.so.0()(64bit)
libnet.so.0()(64bit)
libplumb.so.2()(64bit)
libplumbgpl.so.2()(64bit)
rpmlib(PayloadIsLzma) <= 4.4.6-1
So you you need resource agents package and pacemaker (+ other packages) to be installed before this...
Tomas
A set of newer setup-guides for different scenarios and SLES for SAP Application versions could be found at Best Practices - Resource Library | SAP Applications | SUSE.
Hello,
...and one more update... 😆
Resource agents versions located in packages mentioned above (which I would consider GA version are following):
server-xyz # grep "<version>" SAPHana*
SAPHana: <version>0.149.4</version>
SAPHanaTopology: <version>0.149.3</version>
while there is more up-to-date version (which I would consider as latest development build) available here: https://github.com/fmherschel/SAPHanaSR
at the time of this comment the versions were following:
server-xyz # grep "<version>" SAPHana*
SAPHana: <version>0.149.4</version>
SAPHanaTopology: <version>0.149.4</version>
So I would not say that github is having older versions... 😉
Tomas
Tomas, Thanks for sharing all the details and appreciate your quick response.
Thanks
Hari
Yes the github repository is the upstream project and should always have newer versions of the resource agents. However these versions are not ready for productive systems, because they are not tested. It's just the repository to exchange update/merge requests between the persons contributing. So please, please do NOT use the github versions - always use the official released versions.
The resource agents must be fetched from the SUSE official update channels. If you have a valid SLES for SAP Applications registration you can just get the resource agents by using zypper. The resource agents and all calculated prerequisites are fetched by zypper from your update source. This is either the SUSE customer center (SCC) or your local mirror like SMT.
In case we got demo/test system that is fully offline from any update servers - what is the recommended way to download manually...
Using the patch finder at www.suse.com and downloading the packages following the dialogs.
Patch finder: https://download.suse.com/patch/finder/
Today's newest patch for SAPHanaSR: Downloads - SAPHanaSR 10162
You need at user/subscription to be able to download the packages.
Hi Fabian Herschel and Tomas Krojzl,
Is MDC supported on latest SAPHANASR package? I read it somewhere that MDC is not supported yet, could you please confirm?
Also, will SUSE update its HANA SR on SUSE HAE guide? The last update from the doc download link was based on SP3.
Hope to hear from you soon.
**re-posted in Automate SAP HANA System Replication with SLES for SAP Applications , i believe that should b the right place.
Thanks,
Nicholas Chang
Hi Nicholas,
afaik, there was a bug with takeover for MDC in SAPHANASR package - but it's fixed by an update.
https://www.suse.com/support/update/announcement/2015/suse-ru-20151523-1.html
Support for MDC with system replication is given - so I don't see any constraints.
BR,
Stefan
Hi @Stefan Schiele
Thanks for the info! Much appreciate.
Couldn't access the suse note with id....
May i know how SAPHANASR works exactly with MDC? Correct me if i'm wrong, if any tenant or its indexserver failed, entire HDB (together with all tenants) will take over to secondary by suse cluster?
Hope to hear from you soon.
Cheers,
Thanks,
Nicholas Chang
Hi Nicholas,
saphanasr checks with 'hana native' methods the system-replication status and makes a take-over decision.
The exact decision 'matrix' i think Fabian Herschel will know in perfection 😉
I think for the SAPHANASR process there is not a big difference if it's MDC or single.
If just one indexserver of one tenant-db fails, deamon and nameserver will try to restart it first - if this doesn't work, it will have impact on the SR-status, which then will be considered by SAPHANASR.
In general, a local restart can be tried first. For single service failures, HANA deamon and namesarver control that. And for whole HANA instance, SAPHANASR can take care about (when I see the code correctly, this is the default value.)
And yes, of course only the whole instance with all tenant DBs can be replicated and switched to secondary site.
BR,
Stefan
Hi Stefan,
Thanks for your reply again! Much appreciate.
i'm awared of how SAPHANASR works as we have it installed and running on our system 😉
just want to know the matrix for SAPHANASR on MDC, any special way for dealing it.
as we know for MDC, there are multiple tenants running inside one HDB, and Hana system replication happens for the whole system (all tenants)
i assume if one of the tenants failed, SAPHANASR will detect it and fail over the whole system (all tenants + failed tenant) to the slave without any special crm coding?
Thanks,
NIcholas Chang
Hi Nicholas,
yes SAPHanaSR always takes the "over-all" status of the system replication which consists of the
status of all system replication services. And there are one or more such services per tenant database. This means if one tenant service or the system-database service would be broken SAPHANASR would need to mark the secondary as not in sync (SFAIL) and to exclude the secndary for takeover till it is in sync again.
Hope that helps
Fabian
Hi Fabian Herschel
Thanks for you reply. Very much appreciate!
I did a test using saphanasr 0.151 and HDB rev112.2 with MDC.
in my scenario, i have one systemdb, two tenants (eg: AB1 and AB2)
However, SUSE Cluster manager is not automate secondary take over when i
i) stop tenant AB1
ii) stop tenant AB2
iii) stop both tenant AB1 and AB2
Slave/ secondary only take over, automated by SUSE Cluster when SYSTEMDB is down.
I reckon take over should be initiated by SUSE Cluster if it detects any of the tenants is down.
Appreciate if you can shed some light on this.
Thanks,
Nicholas Chang
The solution should never takeover because of only a tennant is down. The solution must ask the landscapeHostCobfiguration and this is still OK if you stop tenants. lets think about 20 tenants inside one system. One tenant gets stopped and the solution would also interrupt 19 other tenants because of a takeover. This is not the intention of the MDC support.
The MDC support seaa the RDMS always as a complete thing. If you stop a tenant its your selcection. I have also heared that such stops are also synchronized to the secondary site so the tenant would also be stopped on the secondary.
If you stop the SYSTEMDB than the landscapeHostConfiguration will change to stop and that's the interface between SAP and the cluster to decide when a takeover might processed.
Hi Fabian Herschel
Thanks for the explanation on how suse cluster saphanasr works on MDC.
Ok, what about there are 20 tenants in the database, and 10 out of 20 tenants down due to some internal error, and not stopped intentionally. Take over will not initiate as long as the SYSTEMDB is still up and status from landscapeHostCobfiguration still show ok? Please correct me if i'm wrong. The take over only happen when systemdb is down?
Just want to make it clear.
Thanks,
Nicholas Chang
We could only trust the landscapeHostConfiguration interface. If the landscape says that the SAP HANA site is ok or in internal processing (WARNING) than we could not takeover.
If you really have a case where tenants are broken and the landscape would be still OK or INFO or stay on WARNING for a long time you would need to discuss this with SAP.
Than they could change landsacpeHostConfiguration to detect failed-but-not-stopped-tenants and could decide if the landscape is still OK, INFO or only WARNING.
Thanks Fabian. Seems like there's still a lot of deciding factors on when should automated replication take over in mdc scenario.....
Hi Stefan,
it was not a bug but SAP changed the interface to support MDC. This forced us to change our
RA also.
Regards
Fabian
MDC is supported. You need to use a SAP HANA > rev900 and a RA >= 0.151.
The guides are currently in a rewrite process.
Regards
Fabian
Hi Fabian,
Now I'm configuring fail-over HANA with SAPHanaSR 0.151-0.11.1.
Somehow, resource for HANA startup fails on one node. Slave node can start, but Master node fails. Resource monitor is 'not running' (7).
What should be checked?
(sorry for too tech details...)
Megumi
Typically the primary HANA will always be started first. It's not easy to answer what is the root cause or what you should check, if we really have nothing about your configuration.
I f this is a productive setup (or planned to get a productive setup) please open a ticket at either SUSE or RedHat (as I don't even have the information about the used linux distribution).
This blog is about SUSE HAE, isn't it? My environment is on SUSE for test purpose.
I'm using SLES for SAP 12SP1 and HANA rev. 120. Now, status of HANA resources are "Started" and "FAILED". It seems the failed node is re-trying to start up (blinking on Hawk).
Monitoring is not running on either node, rsc_SAPHana_SID_HDB00_monitor_61000 on'not running' (7): call=816, status=complete, exitreason='none'.
I configured like below (part);
primitive rsc_SAPHanaTopology_SID_HDB00 ocf:suse:SAPHanaTopology \
operations $id=rsc_sap2_SID_HDB00-operations \
op monitor interval=10 timeout=600 \
op start interval=0 timeout=600 \
op stop interval=0 timeout=300 \
params SID=SID InstanceNumber=00
primitive rsc_SAPHana_SID_HDB00 ocf:suse:SAPHana \
operations $id=rsc_sap_SID_HDB00-operations \
op start interval=0 timeout=3600 \
op stop interval=0 timeout=3600 \
op promote interval=0 timeout=3600 \
op monitor interval=60 role=Master timeout=700 \
op monitor interval=61 role=Slave timeout=700 \
params SID=SID InstanceNumber=00 PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
ms msl_SAPHana_SID_HDB00 rsc_SAPHana_SID_HDB00 \
meta is-managed=true notify=true clone-max=2 clone-node-max=1 target-role=Stopped interleave=true clone cln_SAPHanaTopology_SID_HDB00 rsc_SAPHanaTopology_SID_HDB00 \
colocation col_IP_Primary 2000: res_IP:Started msl_SAPHana_SID_HDB00:Master
order ord_SAPHana 2000: cln_SAPHanaTopology_SID_HDB00 msl_SAPHana_SID_HDB00
Sure the blog is about SUSE HAE or better about SLES for SAP Applications. However this does not say that all questions are really related to SUSE Linux Enterprise Server. So my intention was just to make that more clear.
Ok if you are using SLES for SAP 12 SP1 you might need to wait for our next update. We fixed an python incompatability problem betweem SLES for SAP 12 SP1 and SAP HANAs python.
Second with SPS12 (rev 120) SAP changed an interface and added a new field in an output
format so we needed to changed the parser. Now we switched to a more stable output format.
If you need the new package quite early, please open either a SAP ticket at the SUSE queue or a Service Request directly at SUSE to receive a so called PTF (just to avoid waiting for the release in the update channel).
Beside that ... The other thing (maybe result of tries to start/stop) via cmdline and/or HAWK WEB-INTERFACE is that you still have target-role=Stopped which prevents the cluster to start the resoure once it would work correctly. So that's not the reason, but after the package update you should change that (just as a reminder).
Hi Fabian,
I'm facing the same problem as Former Member.
I have SUSE for Sap Applications 12 SP 1 with SAP HANA SPS 11.
I need to set-up this HA in a customer and I'm having message failed when try to start the cluster service.
As you said, I need the package asap, do you know how can I have it asap?
Many thanks, see you