Skip to Content

Having worked with SAP landscapes in various IaaS platforms, I have come to a disturbing conclusion – they are damn hard to keep control of and manage on a medium to long term basis. This has become something of an elephant in the room for many of us Cloud evangelists, but I feel that it is something that must be addressed in order to allow Cloud environments to progress from great finite lifespan systems to systems that are fully integrated into normal landscapes. discussed below are some of the major challenges that can effect Cloud projects/implementations.

Flexibility

It is one of the biggest selling points of IaaS environments is the level of flexibility that they provide. Through this flexibility, we have the ability to do things like

  • Cloning systems – creating clones of systems is as easy as a few mouse clicks, similarly creating instances from these clones is just as easy. This is a double edged sword as creating snapshots of instances requires additional storage, which needs monitored, managed and paid for. By creating a clone, we have doubled the amount of resources being used, if we then create an instance from that clone, we have now tripled the amount of resources being used. As you can see it is very easy to increase the amount of resources being charged for by the IaaS provider.
  • Allocate new infrastructure – creating/allocating new infrastructure is deceptively easy, this is because although it is easy to create an additional 100Gb volume – it requires discipline/processes to make sure it is labelled and catalogued properly to ease administration. The diagram below shows the nightmare that can be unleashed through a lack of discipline.

Volumes_No_Details

The graphs below show the growth month by month of the number of volumes against the number of servers of an implementation I managed recently. In July and August, the system was implemented and stable, in Sept it underwent some DR testing which increased the number of servers and the number of volumes. Despite this testing being complete in October, the number of volumes has not returned to the baseline, in fact it is not even close – even though the number of servers has dropped to baseline.

The graph below shows in more detail the spread between those volumes which are Available and those In-Use, this confirms that in October the number of volumes which were not attached to servers increased. This indicates that although the servers were terminated, people are not deleting the associated storage – because “you never know if you’ll reuse it”.

  • Create new snapshots – snapshots are the “get out of jail free” card of data backups, most IaaS platforms have native snapshot capability which can be used as a replacement for normal backup applications. Although these like backup media need to managed and aged properly to make sure that backup snapshots do not become en exponential mess. Like the diagram above, this ease of creation means that people performing any changes will snapshot a volume ‘just in case’ something goes wrong.

Security

Security has been and continues to be a worry for some on IaaS platforms, and in my opinion a little unfairly.  Many service providers provide deep and granular controls of their services, for example Amazon has the IAM, which provides granular security. Within the AWS platform, each user gets a log on for the AWS console as well as an X509 certificate for signing web service calls. This X509 certificate can be used by any 3rd party application or service and maintains the permissions defined by the IAM. Often people focus on the platform security issues without talking about the security of the OS and application layers, it is easy to hypothesize why this might be the case and many articles have been written to compare IaaS security with on-premise security. Due to the self-service nature of IaaS providers, their desire to make security as easy as possible and the “Jack of all trades – Master of none” approach taken by many IaaS practitioners, it is understandable why companies and people are wary of it. In order to provide good assurances, IaaS platform security must provide auditing and inspection of configuration using existing deployed toolsets, otherwise the security which is not transparent will never be fully trusted.

Operations

In order to move IaaS landscapes from temporary/finite systems to systems that are properly integrated into landscapes, they need to be able to be managed in the same way. This includes tasks like –

Backups – although it is possible to use the native snapshot ability on data volumes, this is not a great solution. This is because ageing the snapshots is difficult but not impossible, take a look at a service called Skeddly.com, this allows you to age and delete snapshots on a scheduled basis. For many operations people, using a proper managed and integrated backup product is still the right way to go.

Startup/Shutdown – in order to achieve the savings quoted by many people, systems should be run only for the periods for which they are required. This means that instances need to be started and stopped according to a defined schedule, for example my own template systems run between 6am and 10pm. In order to achieve this something needs to run the start and stop scripts, two options exist

  1. Run a single instance 24*7 to run command line tools to start and stop the other instances – this goes against the principle of what we want to achieve but it can be used for other purposes as well.
  2. Use a web based service to start and stop the instances remotely, for me this is an attractive option and I have used a service called Skeddly.com to perform scheduled actions on my AWS EC2 landscape.

Management tools

The biggest bug bear I and anyone I have spoken to has, is the lack of a toolset which captures and enables system owners and maintainers to quickly and easily find out how every resource is connected and utilised. All the information is present in every management interface provided, but in every one of them I have used, all the infrastructure components are on different pages – see the diagram below.

Combined_Infrastructr_categories

As you can see from above, I can see the status of all my instances, but if I want to see all the volumes attached I need to go to a different page. This assumes that I have correctly populated the Meta-Data tags from the instances page so I can determine what each volumeis attached to (see volume storage nightmare picture above)

Several people have suggested a number of applications like Chef or Puppet, which I have not had a chance to deploy as they are quite outside my core area of expertise – but I do know that Rightscale uses Chef to manage customers’ infrastructures.

Ultimately, Cloud environments will always walk the fine line between flexibility and uncontrollability. This is simply because if it was easy to provide a simple, flexible and controllable service all host providers and data centres would have them. In order to maximise the benefits of IaaS, there needs to be a clear consensus between the business and IT to define what they want from each system. This will enable IT to create a flexible wrapper round these systems to provide solid management without too much overhead. The really good IT departments will drive this work themselves and automate as much as possible so they can drive their own efficiencies whilst still serving the business. The explosion of IaaS services is partly because businesses got tired of IT departments telling them ‘No’ or it’ll take 4 weeks to create that 10Gb volume.

To report this post you need to login first.

6 Comments

You must be Logged on to comment or reply to a post.

  1. Tom Cenens
    Hello Chris

    Interesting blog and information. I haven’t yet played around with IaaS Cloud environments.

    The need for discipline is also valid for on-premise infrastructure or for that matter any type of infrastructure. Too often space is allocated and not reallocated or cleaned up once it has become obsolete.

    Kind regards

    Tom

    (0) 
    1. Chris Kernaghan Post author
      Tom,

      I think one of the biggest obstacles to efficient management of IaaS platforms are Basis administrators and Infrastructure administrators and how they interact.

      Standard large scale landscapes require a lot of process and documentation to keep control and ‘maintain efficiency’,IaaS poses a direct threat to these structures of control and process as they allow people to bypass processes, but make no mistake when deploying and managing medium – large infrastructures control and processes are needed. This keeps Infrastructure admins in a job, this is no different to large scale SAP landscapes.

      Asking a Basis Administrator to manage an IaaS environment in a flexible manner, without degrading their performance on their existing job is a non-starter for a couple of  reasons.
      1. A Basis admin probably hates the existing processes and will bin them the 1st chance they get – which they will later regret
      2. The Basis admin does not have years of Infrastructure management experience behind them and will try to re-invent the wheel

      Perhaps this needs a chat over a beer next week

      Chris

      (0) 
      1. Tom Cenens
        Hello Chris

        This definitely needs a chat over a beer next week, looking forward to meeting you!

        Basis admins are a special breed no doubt about it 🙂

        Kind regards

        Tom

        (0) 
  2. Justin Broughton
    Hi Chris

       Great blog, I’d be interested on your thoughts on the following:

         In your article you concentrate (amongst other things) on centrally provisioned storage but how does this compare to thin provisioned storage and the associated enablers of just-enough and just-in-time?  AWS commodity computing platform and in particular ECB have gone someway to addressing the operational/process issues that you point out.  Provisioning storage in the sleek fashion in Enterprise/Corporate/Hosted environments will take time.  This is the same (to varying degree’s) with compute power, memory, network, backup etc. In short I’m a fan of AWS, however recently I have seen resistance (especially in AsiaPac) due to:

         Given that Amazon and Facebook (to name only two) have succumbed to government pressures and have withdraw and even shut-down services to their customers in extreme cases.  Is, commodity computing across international borders still a viable option?

         In fact do these examples strengthen the need for data and services to be provisioned and controlled by the legal systems within which the company entities reside and operate.

         The benefits of ‘only pay for what you use’ are undeniable but given the situation above does this really need to be provisioned by regional data centres.  Commodity computing is the way forward, I sure, but this will rely upon the abstraction layer of automation (e.g. as AWS ECB is provisioned using only customer facing web pages and work-flow) orchestration, consumption, billing and monitoring before we can truly say.  ‘CustomerX is provisioned and legally protected with the ServiceOwners public,private Cloud.’  A situation that globally IS’s have not managed to as yet achieve.

    Your thoughts

    Justin

    (0) 
    1. Chris Kernaghan Post author
      Justin,

      Long time no talk :-),

      Whilst great strides have been made in the management interfaces from several service providers, there is still much work to be done especially around the presentation of data. I am no fan of having to manually connect all the resources currently deployed in my landscape – I have better things to be doing.
      Undeniably there are localisation issues which affect directly the legality and performance of systems, something which is not immediately apparent to some users – the text book example being the US Patriot act and it’s legal interpretation. Global commodity computing is a reality, the same way global companies are a reality – the difference is the level of legal oversight on the operations which cross borders.

      For me, the challenge is not the services, the legal constraints or the technology – it is the ability to operate in a sustainable, efficient and legal manner without sacrificing the flexibility of the IaaS platform. If companies can structure themselves to allow the business and IT to do this then we’ll see some really exciting things.

      Chris

      (0) 

Leave a Reply