Skip to Content

/wp-content/uploads/2013/09/950x450_281855.jpg

Replay of “High Availability and Disaster Recovery with SAP HANA“, one of the Hands on HANA Webcast Series, is now available on-demand.

Go to this page to watch the replay.

What is the minimum bandwidth requirement for HANA 1.0 SPS06? How do we get some estimation based on the existing redo log size?

For SAP HANA System Replication the bandwidth depends on the business load on the primary system of the customer. Please check your classical database and double the log amount as a buffer and use this per time unit to be transferred to the other side. This is then the minimum transfer rate. On top you will need further bandwidth for peaks from the delta data transfer happening every 10 minutes. This depends also on the business load and can vary accordingly. Therefore we don’t simply want to request a 10 GBit dark fiber line between data centers, but keep the cost as effective as possible according to effective business load.

Can you spend a few min on the cost associated with HANA for customers? Including HANA License, hardware, and project implementation (consulting) costs.

This is one of those ”it depends” answers. It depends on the customer’s use cases, project scope, use of rapid deployment solutions, data sizing, etc…

What is SAP licensing cost for HANA?

HANA licensing depends on type of HANA license (Enterprise, Runtime, etc…) and number of blocks purchased.

There’s a delta cost for HANA compared to for e.g. a non-HANA SAP BW implementation. Knowing costs associated will be critical when having initial discussions with customers about HANA as an option.

Yes, that is true. This is why we highlighted on one of the first slides to engage all stakeholders, early and often.

What is the recommended network bandwidth speed between HANA appliance hardware and SAP Applications server (R3)?

For SAP HANA System Replication the bandwidth depends on the business load on the primary system of the customer. Please check your classical database and double the log amount as a buffer and use this per time unit to be transferred to the other side. This is then the minimum transfer rate. On top you will need further bandwidth for peaks from the delta data transfer happening every 10 minutes. This depends also on the business load and can vary accordingly. Therefore we don’t simply want to request a 10 GBit dark fiber line between data centers, but keep the cost as effective as possible according to effective business load.

Is this feature supported in SoH (Suite on HANA)?

Scale-out for SoH is currently in ramp-up with SAP, but a single node worker node can be combined with a standby node to provide such High Availability feature for a single node SAP HANA configuration. This can be expanded to DR.

What happens to the data residing in memory when the node fails?

The memory is rebuilt during restart of the standby who takes over the identity of the failed host. The persistency offers all necessary elements to rebuild all committed data here.

How long does the failover take?

There are two stages: 1st, until SAP HANA does recognize a failed node (usually there are 3 trials with a 30s timeout in between, i.e. about 1.5min until the failover will be initiated) and then 2nd the reload of the data: in case of a slave failure, just the columns that are required to respond to outstanding queries have to be loaded – this is a matter of seconds. If a master node failed (in case of a BW workload), this might take a few minutes to reload the complete row store. But overall it will be a process of minutes, fully automatic.

I agree that HA is automated, but DR is NOT. Am I right?

Different choices: yes, HA is fully automatic. DR can be set up with a stretched HA scenario (1 worker on primary side, 1 standby on DR side) – also fully automatic. Synchronous Replication for DR, yes, that will have to be tailored to your needs, but can be automated.

What does Quorum node do?

Avoid split brain situations. Imagine the network between the two nodes is broken: will both sides fire responses and take action? No, only the node still connected to the quorum has majority and will stay functional. The other node will stop working.

What software is running on the Quorum node on slide 11? Is it also running HANA? Are the hardware requirements similar to the 2 main nodes?

No, it is just a quorum for the file system to make sure only one version is active. During normal operations, there is a local version and a replicated copy. If one side (only the storage or the whole node) does fail, only one version will stay available. The overall system remains available. There is just the operating system and the GPFS file system running there – per default. But you can add other functionality to that node as well: backup, monitoring etc.

Is it mandatory to have Secondary node memory is equal to Primary? Can I replicate from multi node to single node?

No, you would need the same capacity and worker node topology on both sides.

What is performance on failover when it is scripted? Do you have experience with this?

With SAP HANA System Replication takeover times are in the range of 2 to 5 minutes if the secondary is preloaded with data and 10-20 minutes if not. With SAP HANA Storage Replication we have similar times with 10-20 minutes because some data parts on the secondary has to be loaded completely from the persistent disks and log has to be rolled forward.

Can SAP HANA keep a copy of the PRD image/snapshot in the DR site before we perform the DR drill so that we can reload the old image to perform the synchronization once the DR drill is over. This will help us to shorten re-sync time and minimize bandwidth.
As long as you do not restart the DR side, the PRD replica will stay there. As soon as you connect your app servers to that side, start HANA on that side, and start working on that, the data will be in production and ready to be changed. You can certainly run a snapshot function there to maintain a specific set of data.

Is the node size an indicator of the primary data segment size? Or does it include HDD & Log replica?

The node size is the amount of physical memory provided by that server. You would need to run a sizing of your application with SAP to determine how much row store and column store would be required and for what workload (BI, BW, BS).

This cluster has 4 nodes and one 1 standby, does that mean for every 4 nodes we have a standby? Is there a best practice for bigger clusters?

No, you could go with 15 workers and one standby, or 55 workers and 1 standby as validated with SAP today. It might be beneficial to define 2 standby nodes for planned rolling upgrades.

What is the solution from IBM for HA with Low network speed?

HA within a data center will be provided with internal 10GbE switches so local high bandwidth is guaranteed. For the DR scenario with connections to a remote data center, this will depend on the overall capacity (mind the initial load) and workload (change frequency…, etc.).

Concerning warm standby: will the IP address from primary HANA be transferred to secondary HANA

automatically in case of failover?

That can be made part of the overall automation process. Out of the box it is a manual process.

Can you please provide more details on how database kernel transfer the data? (log replication?)

The log is written in parallel to the remote as to the local site. Here we work similar to other known shadow database solution on the market. As a difference to current shadow solutions, SAP HANA needs still a delta data transfer on top of the log transfer as a current compromise. We hope to get rid of this delta data mid to end of next year (2014). The delta data amount is evaluated from our internal shadow memory management we use to create internal savepoints (similar to filer snapshots) and have with that a delta process on database page level.

In a Synchronous replication scenario, what happens when transactions are written to primary node when the DR disk storage system is down? Will the transactions wait?

In case of GPFS replication, yes, the transactions will wait and will resume upon reestablishing the connection.

Is Distributed Datacenter Scale-out Async solution supported at HANA application level (SPS06) or at GPFS level?

Currently it is support with system replication (as with SPS06). For GPFS we are working to get an asynchronous approach validated with SAP.

How the DR & synchronization works b/w the Datacenters when Datacenter 2 is not identical or similar as DC1 in terms of available nodes and partition data? Or is it mandate that both DCs have to be identical and similar in terms of hardware, nodes…, etc.?

The capacity and worker node topology has to be maintained on both sides, i.e. you could have 10 worker and 2 standby nodes on the primary side and only 10 worker nodes on the DR side. But reducing the number of worker nodes on the DR side is not supported today.

How to size the Bandwidth between the Data Centers using Synch and Asynch modes?

This highly depends on your RPO requirements: the longer an outage and data loss you can accept, the lower the bandwidth might be. A higher bandwidth certainly reduces the RPO in an asynch scenario.

Is there any backup concept like incremental or full for SAP HANA?

There are two fully certified solutions for HANA backup using BACKINT: IBM Tivoli Storage Manager and Symantec NetBackup.

Is the SAP Internal System running on IBM?

Sure! Please watch the SAPPHIRE keynotes: HANA Enterprise Cloud would not be there without IBM, CRM is running on 6TB nodes, and we’re also working on SAP IT’s ERP system (just to repeat what Vishal Sikka, SAP CTO mentioned there).

When is planned to release scale-out systems for suite on Hana?

This is currently in ramp-up. Please contact SAP to get enrolled.

Is there a recommendation about distance/latency recommended to implement Disaster Recovery between Site 1 and Site 2 for Synchronous and Asynchronous replication?

There are upfront measures: if the one way network latency is below 350us, synchronous replication will work well. Finally, the fastest benchmark will have to be run to determine whether network optimization will be required from a bandwidth/latency perspective.

Is DB consistency maintained in case of both synchronous & asynchronous scale-out multi-node scenario?

In the sense of the word, the RPO with a synchronous approach equals to zero. So, yes, it does stay consistent all the time. An asynchronous approach might lose some packets in flight. Depending on the bandwidth and latency, you can reduce the RPO to the minimum level.

Is there a recommended # of standby servers depending on your cluster size / t =shirt size?

Up to 56 nodes has been validated with SAP. 1 standby is the least, 2 standby nodes are definitely recommended to also cover planned outages (e.g. rolling upgrades) and more standby nodes can be configured to provide HA against multiple node fails. Please keep in mind that the failover has to be finished until protection against another node fail will be resumed.

If you have a very big replication environment, I believe that in the infrastructure side its good. But if you have a logical corruption, you will have a lot of work to do (restore, replicate again…, etc.), am I right?

Well, from an IBM perspective this is described in the Operations Guide. All kind of scenarios are taken care of there.Whether you loose infrastructure components or just data. Please get in touch if you’d like to elaborate further: rettig@de.ibm.com.

We are regularly facing issue of HANA slow down and HANA database crash. Please address those issues. Our developement works get happered due to this.

Are there OSS tickets that you can refer to? Certainly this is annoying and should not happen. I’d like to understand the specifics on why does that happen and would certainly work with SAP and my team to provide assistance herewith: rettig@de.ibm.com.

Can I use System Replication for HA AND DR? Replicate tp a 3rd system?

Yes – with the IBM solution.

What HANA training SAP courses and links should we access to actively get involved with HANA?

We have several resources on sappartneredge.com and saphana.com.

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

  1. Anil Chandaliya

    Hi Team,

    We are trying to reduce the cost of SAP HANA Appliance using below 2 option can you please suggest, is this going to work.

    Sr. No. SAP Landscape HANA DB SIZE
    1 ECC PROD 2.5 TB
    2 CRM PROD 1.5 TB
    3 EWM PROD 500 GB
    4 GRC PROD 256 GB

     

    Option 1

    At the primary side, we have 4 TB HANA Appliance running with One ECC PROD SAP HANA Database using SCOS Technology, also we have another 3 TB SAP HANA Appliance running with 3 PROD System i.e. (CRM, EWM, GRC) using SAP HANA MDC Technology.

    Can you please suggest if we are going to use one 6 TB SAP HANA Appliance for Disaster Recovery site for reducing the cost, and put all the PROD system in one 6 TB SAP HANA Appliance using SAP HANA MDC Technology i.e (ECC, GRC, CRM & EWM), is this solution is going to work.

     

    Option 2

    At the primary side, we have 4 TB HANA Appliance running with two SAP HANA Database using MDC technology, i.e. (ECC PROD & GRC PROD), also we have another SAP HANA Appliance 3 TB running on 2 PROD System i.e. (CRM, EWM)  using SAP HANA MDC Technology.

    Can you please suggest if we are going to use one 6 TB SAP HANA Appliance for Disaster Recovery site for reducing the cost, and put all the PROD system in one 6 TB SAP HANA Appliance using SAP HANA MDC Technology i.e (ECC, GRC, CRM & EWM), is this solution is going to work.

     

     

     

    (0) 

Leave a Reply