Skip to Content

SAP has certified the Amazon Web Services cloud as a suitable platform for running production instances of some products. The Amazon cloud is probably the most well known of the Infrastructure as a Service cloud vendors. Before making any sizing decisons or or decisons regarding using AWS for SAP systems, please check the latest version of the Operating SAP Solutions on AWS White Paper (PDF).  This details the special considerations for SAP Systems on AWS, including some Operating System restrictions.

However, there are some other caveats and gotchas that you need to be aware of before putting any system (SAP or otherwise – even your Development, Testing or QA instances, let alone Production instances) in any cloud environment. It is sometimes tempting, even at a very high-level, to think of cloud based infrastructure as a form of what used to be called remote computing, where the datacenter is located some distance from the users, administrators and developers, just much cheaper to use and much quicker to provision. For most parts of an SAP implementation, this does hold true; users connect via NWBC, a browser or the  SAP GUI to a DNS name, and manipulate the information they find – they add to it, update it, share it, regardless of where it’s stored and the computer(s) used to perform the work.

However, this does avoid a key concept of Cloud computing which the idea of commodity virtualisation of everything. So, bearing this in mind, let’s explore some important lessons about Cloud Computing.

 

Lesson 0: Only the paranoid survive

Andrew Grove was chairman of Intel when he published a business book called ‘Only the Paranoid Survive’. It sounds like an awfully cold way to deal with business colleagues, but when it comes to down to me and the computers, it has been a useful one.

 

Lesson 1: SLAs Are Meaningless

You can’t compare any kind of hosting services based on their advertised SLAs. Instead, base your comparisons on their response to you and your company’s issues. Regardless of what they say, ‘stuff’ will happen. Yes, Amazon has a service level agreement for EC2 of 99.95% uptime, averaged over the last year. You would imagine that this was set (by Amazon) based on historical information.  However, as they say in the financial pages “historical behaviour is not an indicator of future performance”. And when ‘stuff’ happens, where are you in the queue, for personal attention, recompense, or even just a communication of some sort ?

By the way, due mainly to the recent outage, EC2’s uptime over the last year is around 99.5%.

 

Lesson 2: YOUR Architecture CAN save You from Cloud Failures, but …

Disaster Recovery processes have two major SLAs; the Recovery Time Objective, which is a duration of time (an SLA, really) within which a business process must be restored after a disaster (or disruption), and the Recovery Point Objective which describes the acceptable amount of data loss measured in time. By the way, the O stands for Objective, not Agreement or Mandate (see Lesson 1).

This means that if an instance becomes unavailable to the business, they want a working system within the RPO time, with data loss of less than the RTO.  This requires the same thinking and planning that goes into Disaster Recovery planning for an in house system. In turn, this means managing and planning for  Disaster Recovery and Data Security, and allowing for the typical requirements of a Disaster Recovery Plan, except with a Cloud twist to them…

  • You still need to choose the right infrastructure,
    i.e. Does your vendor have seperate physical locations ?
  • You need to manage your view of the infrastructure,
    i.e. How easy is it to transfer backups from one physical location to another ?
  • You still need to test the transfer of backup data,
  • You still need to test the restore / restart of your system in the alternate location,
  • Your vendor may provide alternate physical locations,
    but do you have / need an alternate provider ?
  • and so on

 

Lesson 3: There is a BIG difference between virtual machines and the hardware.

Things get a little more difficult at the micro level.  Fault-tolerant environments are a centerpiece of the cloud hype, but generally, most developers don’t see, and therefore don’t think, about the difference between virtual and physical hardware. The issue with  virtual machines (in-house virtualisation or clouds) is that the view from the operating system ends at the hypervisor. You can not see what happens at the metal. Now, for computer systems to work as we have grown to expect, certain things are sacrosanct.  This is because without them, there is no guarantee that what we write will be there when we go to read it (this applies just as much to  memory as it does to disk). 

An example is the sync() or fsync() system call, that instructs the Operating System to write all the data currently in the  filesystem buffers, out to disk. Now, in virtual machines, whether or not fsync() does what it should is a bit of a mystery. In fact, there has been suggestions that in particular circumstances and under high load Amazon’s Elastic Block Store, at least according to sources close to Reddit, will happily accept calls to fsync(), saying that the data has been written to disk, when it may not have been.

No amount of virtual architecture is going to save you from virtual hardware that lies.

 

Lesson 4: You don’t HAVE to put ANYTHING in the cloud.

The general rule is that if the machine / image dies, then you must be able to recover data, or restore the service. If you’re hosting a database server, then it will need to be restored or recovered. On the other hand, an application server is much simpler; just write some configuration files. Once you start looking at it like this, it may make sense for a more risk adverse site to put some server types into the cloud and leave others in the data centre. In short, Virtualisation and Cloud computing is not a universal panacea to hardware resource problems.

Of course, many people would say that “commodity” computing is a misnomer, because servers are not really something that should be commoditized, that a “pick one of four sizes” offering is insulting. To a certain extent this is true, but Cloud computing servers are so cheap that you can build around inefficiencies in some parts of the commodity offering by overcompensating in others.

For example,  once people realise how cheap CPU and Memory are on IaaS services, they tend to go at least one ‘size’ higher than they would for an in-house server, and they still see massive savings. Regardless of what the purist thinks, it is becoming much more business-efficient to throw hardware at performance problems than it is to spend time investigating the root cause, which leads into …..

 

Lesson 5: You still need to tune and manage your systems.

In Cloud computing costs are tied directly to resource usage. The  virtues of cloud computing are a double edged-sword; Because  provisioning systems is so easy, you may see developers running a dozen  tests at once, instead of one after another, to speed up implementation  cycles. This means any inefficiencies in the base systems used for such  testing will be magnified, which will directly impact costs.

Just as importantly, resource usage variations in your production systems  will show up directly in the bill. However, the customer or business  user paying the bill will want to know why these variations have  occured. Are they due to different processing rules, different volumes,  program or system changes ? You want to see a consistent relationship  between the business workload and the resource usage (and therefore  cost). This makes budgeting and planning much easier for the Business,  and provides them with confidence in both the SAP support teams and the  platform.

 

Lesson 6: It is not enough to be secure….

…you need to be seen to be secure. Amazon already performs regular scans of the AWS entry points, and independent security firms perform regular external vulnerability threat assessments, but these are checks of the AWS infrastructure (such as their payment gateways, user security and so on). They don’t replace your own vulnerability scans and penetration tests. Because it may be mistaken as a network attack, Amazon ask to be advised of any penetration tests you wish to perform.  These must be limited to your own instances.

Being seen to be secure also means using all the features (including the Amazon Virtual Private Cloud) that are referenced in the AWS Security White Paper. This document, which is updated regularly, describes Amazon’s physical and operational security principles and practices. It includes a description of the shared responsibility for security, a  summary of their control environment, a review of secure design  principles, and detailed information about the security and backup  considerations related to each part of AWS including the Virtual Private  Cloud, EC2, and the Simple Storage Service,

The new AWS Risk and Compliance White Paper covers a number of important topics including (again) the shared  responsibility model, additional information about the control  environment and how to evaluate it, and detailed information about the AWS  certifications. Importantly, it also includes a section on key compliance  issues which addresses a number of topics that get asked about on a  regular basis.

 

Summary

There are differences between managing real servers, virtual servers and Cloud based servers. However, much of what is required for SAP landscapes and Implementations is the same which ever platform you use. In fact the BASIS team may be the only people who notice the difference. One of the biggest differences is the perception of control and ownership, because you can’t “hug your server” any more. What are the biggest differences you see, and how do you see them impacting you if or when your organisation starts implementing SAP systems in the Amazon Cloud ?

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. Tom Cenens
    Hello Martin

    Glad to see the rise of the #SAPADMIN hashtag on SCN again.

    I doubt we will see many productive SAP systems in the public cloud soon but I bet some are willing to take the risk and are in fact doing it or planning to do it.

    The question of availability is, do you have so much better availability in your own private cloud or it is comparable? Hardware failure or other events can occur in any type of cloud.

    Will the public cloud cause hosting companies to seriously revise their cost models? Yes it will. Is this a good thing for the customer? Yes it is.

    I think we will see private clouds being used a lot which is already the case for many companies. It just didn’t have the label tagged to it.

    No doubt that this and other new emerging technology requires flexibility and risk taking in order for companies to keep up with the competetion and be innovative.

    The business world itself is rapidly changing. Companies can be huge over night and companies can die over night.

    We have exciting times ahead of us as the IT world starts to climb out of the dip and starts innovating and is again prepared to invest in technology.

    I see the role of the #SAPADMIN changing drastically over the next ten years. The question will be, will we still call it #SAPADMIN by then or will be call it #SAPHYBRID for example.

    More and more tasks and operations will be automated and will no longer require intervention from a #SAPADMIN so our role we be reinvented and we will definitely see a shift in our day to day activities.

    Kind regards

    Tom

    (0) 
    1. Chris Kernaghan
      Tom/ Martin

      The availability question is something that comes up regularly with clients – their assertions about their total uptime requirements rarely meet with reality. 99.5% uptime is not too shabby for many cases, of course there is a difference between planned and unplanned downtime. Something IT departments find difficulty in communicating upwards.

      As regards the role of the Basis administrator changing over the next 10 years, I would make it sooner – we are already seeing the influence of many disruptive technologies within our workplaces. Although one thing we have struggled with as people and organisations over the last 10 years is getting better at distributed delivery/operational teams. Now that hopefully we have these things worked out we can concentrate on getting better baseline technology implemented.
      I have struggled with the Cloud definition, and to some extent I am past it – I don’t care what it is called, as long as it meets the objectives of the customer. I would prefer to see vendors differentating themselves on their value add services not terminology – as soon as the data leaves your internal environment it does not matter if it is Cloud or a highly virtualised hosted environment. What counts is the services that back hosting and what you are paying for.

      (0) 
    2. Chris Kernaghan
      For the second time today I am writing this reply, ggrrrrr 🙁

      The question of availability comes up regularly with customers, and they usually over-estimate their uptime requirements, especially on non-Productive systems and knowing the difference between planned and un-planned downtime. Some IT departments are poor at communicating the difference upwards, especially with regard to 3rd party providers and their KPIs.

      The Basis role has changed and continues to change, but I think 10 years is a little pesimistic. We have spent the last 10 years developing good methodologies and technology to enable us to work effectviely in a distributed delivery/operations environment. Now that we should have these things working better, we can spend more energy implementing new baseline technologies to further enhance our work, things like automation and mobile.
      I have struggled with the term cloud for a long time and to some extent I am past it. I do not care what the vendor calls it, as long as it meets the requirements of the customer/project then I am happy to use it. Technology should not be a reason in itself, it should be an enabler to a good service and good services provided by a vendor. I often think of Kaj’s statement at Teched last year “5 years ago people were paying big money for Cloud-like flexibility, now people are expecting it for low cost.”

      (0) 
    3. Chris Kernaghan
      For the second time today I am writing this reply, ggrrrrr 🙁

      The question of availability comes up regularly with customers, and they usually over-estimate their uptime requirements, especially on non-Productive systems and knowing the difference between planned and un-planned downtime. Some IT departments are poor at communicating the difference upwards, especially with regard to 3rd party providers and their KPIs.

      The Basis role has changed and continues to change, but I think 10 years is a little pesimistic. We have spent the last 10 years developing good methodologies and technology to enable us to work effectviely in a distributed delivery/operations environment. Now that we should have these things working better, we can spend more energy implementing new baseline technologies to further enhance our work, things like automation and mobile.
      I have struggled with the term cloud for a long time and to some extent I am past it. I do not care what the vendor calls it, as long as it meets the requirements of the customer/project then I am happy to use it. Technology should not be a reason in itself, it should be an enabler to a good service and good services provided by a vendor. I often think of Kaj’s statement at Teched last year “5 years ago people were paying big money for Cloud-like flexibility, now people are expecting it for low cost.”

      (0) 
  2. Chris Kernaghan
    For the second time today I am writing this reply, ggrrrrr 🙁

    The question of availability comes up regularly with customers, and they usually over-estimate their uptime requirements, especially on non-Productive systems and knowing the difference between planned and un-planned downtime. Some IT departments are poor at communicating the difference upwards, especially with regard to 3rd party providers and their KPIs.

    The Basis role has changed and continues to change, but I think 10 years is a little pesimistic. We have spent the last 10 years developing good methodologies and technology to enable us to work effectviely in a distributed delivery/operations environment. Now that we should have these things working better, we can spend more energy implementing new baseline technologies to further enhance our work, things like automation and mobile.
    I have struggled with the term cloud for a long time and to some extent I am past it. I do not care what the vendor calls it, as long as it meets the requirements of the customer/project then I am happy to use it. Technology should not be a reason in itself, it should be an enabler to a good service and good services provided by a vendor. I often think of Kaj’s statement at Teched last year “5 years ago people were paying big money for Cloud-like flexibility, now people are expecting it for low cost.”

    (0) 

Leave a Reply