It happens sometimes – unexpected load and your application starts to hiccup. Yes, sure, somebody should always be monitoring and operate it, but what if you want people focus on other tasks, while being at peace of mind that your cloud application is elastic?
So, let me describe in few easy steps how you can create your own automatic application horizontal scaling system, using a beautiful set of APIs made available courtesy of our HANA Cloud Platform team. Oh, what was that? You’re being impatient? Then scroll down and grab the code directly from GitHub as a Maven project, have a look at it and deploy it on your account. Good luck!
Now, for the rest of the people that are reading this, what we’re going to do is very simple: we shall create a tiny application that runs on HCP, monitors whatever app of your choice (given that it belongs to you and is deployed on HCP), takes a set of rules, runs them against your defined thresholds and decides whether action is needed for starting or stopping additional instances of the application.
An architectural sketch of the described application looks something like this:
Let’s go through each component.
The Front-end Management Console
We could design our app such that we need to specify which application to scale from the front-end. Of course, it’s up to you to decide what the front-end will be actually doing, as one could imagine that the elastic scaler would look much like a service rather than as an app, but for starters we’re just fine.
The UI we’re creating will use JQuery and Twitter Bootstrap libraries and will be very simple: 2 fields for specifying the account and application to scale and a number of controls for updating the scaler’s back-end and starting / stopping the monitoring. The UI looks like in the picture below, and on GitHub you’ll find the sources for the needed HTML and JS files that will go under the webapp folder.
We’re going to need at least a servlet for our scaler. This servlet will get the front-end requests and will talk directly to the Scaling Setup Manager, the heart of our application. We’ve designed the ScaleCentral class as a singleton, and holds data about the application being monitored while acting as a central proxy for the scaling actions and commands.
The rest of the mechanism works like described in the picture. For triggering various actions, we use HTTP request parameters. Thus, we’ll use a parameter named action that may take the following values:
- – “query” – this value should actually always be set for the initial back-end call, because it sets the account and application that will be monitored (we enforce this from the front-end). Therefore, the following parameters should also be set: accountName and applicationName.
- – “start” – this value indicates that the monitoring of the specified application starts. A different thread is started that checks every 5 seconds (timer can be changed) if the application needs to be scaled up or down or simply run as it is. During the monitoring, we’re constantly pinging (one could also use websockets) to back-end to get the status of the application in order to display it on our web-page (MonitorServlet).
- – “stop” – this value indicates that the monitoring will stop.
- – “startApp” – this value indicates that a new process will be started for the application.
- – “stopApp” – this value indicates that a process will be stopped for the application.
There are a few interesting aspects that I’d like to mention further. One refers to the rule engine. The rule engine here is very rudimentary, but it does the job, at least for the applications that we’ve tested. The rule engine always checks whether scaling up or scaling down is possible. The monitored parameters for which it performs the checks and their thresholds (for down and up-scaling respectively) are defined in the params.properties file and they are used in conjunction (logical AND). For instance,
means that the rule engine will check the metric named cpu_utilization and it will return true for the down-scaling rule if the value is below 25 and true for the up-scaling rule if the value is above 50.
But where does the rule engine gets the values of these parameters from? Ok, now comes the second point I was looking to discuss.
The HANA Cloud Platform APIs
First, in order to get the current application metrics, we are using the Monitoring API (https://api.hana.ondemand.com/monitoring/v1/documentation). This API is one of the platform-embedded APIs that prove to be very helpful for developers. It is REST-based and not only that it brings information about the status and metric details about an application and its processes, but it also does it for all the running applications without affecting their performance. Pretty neat, right?
And speaking of platform APIs. Whenever we need to scale up or down the monitored application, it means that we need to trigger an action like process start or process stop. For that we’re using an API that allows to get information about an application, start, stop its processes (although you could do several things more, like deploying, un-deploying or changing applications), namely the Lifecycle Management API (https://api.hana.ondemand.com/lifecycle/v1/documentation).
The code is available on GitHub and pretty much speaks for itself, as it is not very complicated. You can use it as you wish, play around with it, enrich it, etc.
My hope is that I’ve shown you how easy it is to create a very lightweight elastic scaler for HCP applications using not too much code with the help of the platform-embedded APIs.
I’d love to hear your thoughts on the topic!