BI in the cloud: sizing SAPS and ECUs on AWS servers
BI in the cloud: sizing SAPS and ECUs on AWS servers
The first step usually associated with this activity is to assign a BI expert to analyse the existing deployment or determine the expected content, users, load and availability, and feed these numbers on to SAP’s sizing tool (BI4 Resource Usage Estimator). Experienced BI Consultants will have other sources to help with calculation (such as configurations used in similar deployments with good performance), but this is usually not available to the SAP Customer.
The sizing tool is very useful and provides some method to madness, but the reality is that you are expected to make subjective assessments before getting any meaningful results. A subjective factor that influences the sizing is the categorisation of reports as “small”, “medium” and “large”. The categorisation of users is somewhat vague too, as the definitions of each type of user contain relative terms such as “moderate”, “little”, “large” and “heavy”.
The BI sizing guide is a very useful source of information, and will walk the reader through all the factors and concepts involved in sizing a BI deployment, but even it admits that human judgment is always a vital piece of the puzzle.
The “art” of sizing, then, requires one to extract the information from the current deployment, which is usually done via Auditing, Query Builder or through third-party tools. One can also go crazy and develop Java SDKs (Software Development Kit) for that same end. This information is about how many users are using the system, and with which intensity (light viewers, average business users, demanding developers), as well as what sort of documents (and how
big these are) said people are consuming or creating.
These numbers (aka KPIs) are then fed into (along with your human judgment) SAP’s sizing tool, and it spits out two results, for each of the deployment layers (web application, intelligence, processing and CMS DB host):
- number of GB required for Random Access Memory (RAM) for each layer;
- a “mysterious” number in “SAPS” for CPU for each layer.
SAP BI Resource Usage Estimator
The SAPS thing
Except for the SAPS bit, it is reasonably easy to read these outputs as required resources in each of the layers of the BI deployment. So in the screenshot above, one needs a total of 28GB of RAM for all layers, and at least 14GB of those in the processing layer alone, in which the adaptive processing servers will be active. The resources required to run the operating system and other applications are not included, so one needs to take that into account.
The RAM information will determine how large you need your web application server (Tomcat, Websphere, etc.) to be, the server hosting the CMS database, and the BI server or servers if one decides to scale horizontally. The RAM required is known at this stage (although some subtleties around virtualisation increases the complexity as will be discussed below), but what about CPU? What about this SAPS business? If you communicate to your Customer or Project Manager or Infrastructure Manager that you need servers with a total of 14,000 SAPS chances are most will not be able to translate that into real-life hardware.
The SAPS metric has been developed by SAP to abstract from the CPU measure, and take into account I/O processing along with raw computation capacity. Although this is not a common metric for hardware, SAP provides benchmark numbers (hardware certification) for a certain number of SAPS per machine, and the list is quite comprehensive.
Unfortunately, however, in the real world Infrastructure Departments are usually not keen to acquire the specific hardware needed by a BI Project and even less likely to allow the Project Team to own the whole server. A much more likely scenario is one where Infrastructure will allow you to have a certain number of virtual machines, with the configuration (RAM/CPU/Disk) that you require. That brings back the “SAPS” problem and also adds another complex guest to the sizing party: virtualization.
One cannot directly convert a virtual machine (VM) configuration (say 8 vCPUs, 16GB RAM) to SAPS because this number is not a theoretical result extracted from a formula. It is the “real” result as shown by the hardware on SAP’s labs, or at least so goes the party line. So how does one take the output of the sizing tool in SAPS and convert it to CPUs?
The sad, short answer is that one can’t. Not with any ambition of precision.
The nice, academic solution to this dilemma is to get an exception with the Infrastructure Team, and secure a dedicated piece of hardware with the required amount of SAPS, which you can then subdivide into multiple VMs to satisfy the numbers demanded by BI4 Resource Usage Estimator.
An alternative is to use multiple physical hosts, which is a controversial approach because it typically challenges Infrastructure policies and strategies and may compromise vertical scaling somewhat. There are also additional costs and time penalties associated with logistics around physical hardware. A final option is to throw design precision through the window and estimate the VM power based on the hardware it is based on and use system tests to validate performance and keep on scaling up or out until the minimum performance metrics are met.
In our experience, there is still another option, which can barely be called “sizing”. Smaller organisations will typically settle for the minimum requirements hardware (single 8000 SAPS box) or a similar configuration (somewhere between 4 and 8 cores, 16 and 32GB RAM), and will validate performance by running a few reports and checking how it “feels”. If the performance is found to be “reasonable” they will happily embrace lack of sizing precision.
Any option that relies on virtualization will need to confront the question of whether or not there are performance losses with the overhead of virtualization. Virtualization is an “old” debate that has been around in upgrade and new deployment Projects, and BI is no exception. One still can find Projects designing
BI deployments with physical hardware, but much more common are designs that abstract the physical layer and talk in terms of “VMs”.
There is an elegant whitepaper on BI virtualisations on VMWare ESXi 5, which provides valuable guidance around BI deployments on virtualized (VMWare, that is) environments. The most important conclusion of the tests documented in that paper is that VMs can perform just as well as their physical counterparts when hosting an SAP BI system.
To achieve this level of performance, a few configuration settings should be ensured, but the most relevant (and contentious) one is the reservation of physical resources. For example, when a piece of hardware is “partitioned” into multiple VMs, it is common to overprovision CPUs, so that the sum of vCPUs available is greater than the number of physical CPUs (which increases efficiency of virtualisation). It is this practice that wreaks havoc with BI systems because one of the findings of that research was that resource contention (by overprovisioning resources) will cause dramatic performance degradation.
Other recommendations to ensure that performance level on virtual servers are:
- Resource shares or limits should not be used;
- CPU affinity should not be used;
- Vertical scaling should not go beyond 16 vCPUs;
- Intel Turbo Boost/Intel Hyper-Threading should be enabled;
- VMtools should be installed on all guests.
Another interesting and controversial suggestion of that article about virtualisation of the BI infrastructure is that it might not be suitable to use SAPS on BI systems as a measure of performance, due to the differences between how transactional SAP systems and BI behave. Although interesting, this is beyond the purpose of this article.
ECUs meets SAPS
Customers now have the ability to spin up virtual servers, take them down, enable on demand, scale vertically as easily as horizontally, and get all that goodness from the meager provisions of the OPEX purse. The cloud revolution brings with it a new Infrastructure paradigm (Infrastructure As A Service or IaaS) to which many Organisations are adapting. This revolution has many Players, and initially SAP only blessed Amazon Web Services (AWS) as a “supported” public provider via Note 1380654. Later the same note added Microsoft’s Azure, and now both are supported.
Azure deployments should follow the guidance of SAP Note 2015553 and AWS ones have useful information on SAP Note 1656250 about products supported, SAP Note 1656099, specifically about minimum requirements, and there is also an implementation guide for SAP software on AWS. It is a generic guide, but there are many useful AWS specific concepts that can help anyone who is new to AWS.
Consumers of IaaS Clouds like AWS EC2s do not know the hardware underlying the virtual servers, and that reintroduces the problem of correlating computing power on EC2 instances with SAP’s SAPS. To make matters worse, Amazon has its own metrics for CPU (they call it ECUs), which is different from SAP’s SAPS. These two scales are different and there are no transformation formulas available.
To put it another way: most of the World uses Celsius to measure temperature; BI4 Resource Estimator gives you a “temperature” in degrees Fahrenheit (SAPS) and we are tasked with converting that into degrees Kelvin (ECUs) for AWS, but there is no formula to accomplish that transformation. So how many degrees Kelvin (ECUs) do you need from your AWS machine to meet those Fahrenheit (SAPS) performance requirements?
Even though there is no clear exact answer to that question, SAP has left some guidelines that can be used, along with some assumptions. SAP Note 1656099 initially provided some practical translations from ECUs to SAPS. They were as below:
2-tier SAP system configurations […]
The same note also states that “For BI version 4.0, EC2 Instance Types with a SAP performance rating of 7,400 SAPS or higher should be used”, which makes it easier to determine minimum size of an EC2.
The other measures provide a general idea of sizing relationship between ECUs and SAPS, and are so far the only (albeit murky and imprecise) translation between scales. It is clear that the two results on the first table progress linearly (from 13ECU=3700SAPS to 2x13ECU=2x3700SAPS), so it would seem that each ECU is about 284.6 SAPS , but as SAP added more EC2 types to that table, it became clear that the progression is not linear.
The current version of the note shows 13 EC2 instances added to the list of “supported” ones, all with ECUs corresponding SAPS values. They have also added smaller EC2 instances that are well below the minimum 7,400 SAPS required so that they can be used for a particular layer (web application, for example), but they would not be suitable to host a 3-tier single host environment.
The added instances showed that the ratio SAPS/ECUs varies from 284 to 306, so to increase the precision of this ratio, an average of 295 will keep the number of SAPS per ECU closer to SAP’s findings:
One EC2 instance (cr1.8xlarge) that is contained in the SAP note’s list was excluded from this calculation because its SAPS to ECU ratio is too different from all other instances (outlier), even when other factors are taken into account (other Memory optimized previous generation EC2 instances), plus it is a previous generation instance.
The table below mashes the available information about these machines together and a few interesting conclusions can be derived from it:
Table 1 – SAPS x ECUs average and EC2 types
|Instance||vCPUs||RAM||SAPS||ECUs||Ratio||SAPS derived ECU*Average||EC2 type||Generation|
|Ratio Average (excluding outlier)||295.033|
If the SAPS to ECU table did not exist, and we applied the conversion number (295), one could see that the SAPS derived from that number (i.e., multiplying the ECUs of a given EC2 by 295.033) are not too different from the “real” SAPS published by SAP. The precision would not be great, but with a variation of plus or minus 4% one could at least have a very good indicator of what to expect:
Table 2 – SAPS x derived SAPS variation
Another conclusion one could easily be led to believe is that similar machines have similar SAPS to EC2 ratio. Table 1 shows CPU optimized EC2 instances have the ratio varying between 284 and 294, while Memory optimized EC2 instances from the same generation are parked around 306. This would be a great instrument to refine precision, but this pattern is probably caused by more prosaic reasons. The SAPS for c3.8xlarge (31,830) is precisely double the number from c3.4xlarge (15,915), which is double the one for c3.2xlarge (7,957). The same happens to the r3 instances, which gives an impression of linearity. It is much more likely that the tests were run against one of these EC2 instances and then scaled up or down for the other similar machines. If this does not help in the quest for precision, at least it reinforces the idea that similar EC2 instances are expected to behave very similarly performance wise – even when measured in SAPS.
A final observation about these numbers is that it also validates what other authors have stated about physical and virtual machines alike: the best performing machines are the ones with an emphasis on random access memory (RAM). This is also true for a cloud-based infrastructure. Even the SAPS benchmark clearly benefits more from RAM-optimised EC2 instances than from CPU-optimised ones.
The table provided by SAP on Note 1656099 (with the SAPS to ECU conversion) covers almost all available AWS EC2 instances from the current generation, so there is currently no need for a precise formula to convert ECU to SAPS. The only type not covered by the Note is the “i” type, which is for instances dedicated to storage (800GB SSD, for example).
One can still use the 295 approximation for the “i” EC2 types, or if new instances are made available by AWS and SAP has not yet released their SAPS number. It will most likely be inaccurate, but it is grounded on the available data, so it can be used as a starting point.
AWS sizing and costs considerations
In on-premises deployments, adding more resources or adding more VMs to Project deployments where the individual hosts are not averaging a high consumption of resources is a hard sell, especially if capital expenses are associated with these changes.
On AWS-based deployments, however, it is easy to scale vertically, so if the original sizing estimates are insufficient, one can determine which performance metric requirements are underachieved, and stop EC2 instance(s) associated with it, chose a different type (say, upgrade from r3.2xlarge to r3.4xlarge), restart the instance and reconfigure APSs, the web application server or RDBMS (Relational Database Management System) parameters, depending on where the bottleneck was determined to be.
In these environments, the operational costs are exactly known in advance, and there is no capital expenditure, so it is easier for Project Teams to approach design, in regards to performance, via an “evolutionary” method, which requires trial and error to find the best result. This method still requires some negotiation, but it is more under control of the Project Team and avoids arguments with infrastructure Teams regarding technical details about server components, virtualisation, application-specific characteristics and best practices, which sometimes result in little benefit.
One interesting fact to be aware when making changes to the design by adding more disk space or more EC2 instances to increase performance is that one needs to “pre-warm” AWS storage – Elastic Block Storage (EBS) – devices before measuring their performance, as before a full read or write of all blocks, there is an I/O penalty for first access. This is documented by AWS and there are tools for that end.
So under the new cloud paradigm, any organization willing to have a supported cloud environment can simply pick one of the “supported” EC2 instances and make sure it matches the minimum requirements for BI and are more free to turn the performance knob up or down.
Nevertheless, a more detailed analysis of the AWS instances and their expected performance will provide grounds for selecting which type of instance will provide more bang for the buck early on. “c” type instances (CPU-optimized), for example, are cheaper than their “r” counterparts (Memory optimized), but they (the “r” ones) will always underperform because BI requires more RAM and I/O than CPU, and this is highlighted by the better SAPS to ECU ratio on “r” instances.
For example, suppose you are looking for a basic server, single layer, for a small deployment, and you have determined the minimum requirements (7,400 SAPS) will be enough. You may be tempted to go with the c3.2xlarge instance (8 vCPU, 15GB RAM, 28 ECU), which runs at $0.95 per hour (Sidney Region, Windows Server). The other instance around 7,400 SAPS is the r3.2xlarge instance (8 vCPU, 61GB RAM, 26 ECU), which will cost $1.29 per hour.
If you focus on the bottom line you will miss the fact that the c3.2xlarge not only underperforms, but it is also below the minimum recommended memory for any BI system (16GB). That would become increasingly relevant as one tried to configure the multiple Adaptive Processing Servers (APS) of that deployment, which would need to host multiple services with a small amount of maximum java heap size (the –Xmx parameter). The APSs would contend with the web application and the Operating System (OS) for resources and users would be penalised in the form of poor performance.
In a similar scenario, it would make a lot more sense to choose the r3.2xlarge instance, because the 61GB of RAM would allow for a much better configuration of the APSs, while still reserving enough memory to the OS and web application server and avoiding paging. Users of this system would have a much better experience.
It is clear that when sizing AWS-based BI deployments, the costs of the underlying infrastructure become a lot more relevant than they were when deploying systems on-premises because the costs are now very precise, transparent to all parties, and can be weighed against performance gains. As Consultants, we can help Customers make more intelligent and cost effective decisions when designing their cloud-based BI infrastructure.
Still regarding the example used above, if the costs of choosing a more adequate host for the BI system become prohibitive, AWS provides a few ways to reduce the cost associated with these instances. In the case used as an example, one could select the Linux version of the same EC2 instance – r3.2xlarge, and run the same machine for $0.84 per hour, which is cheaper than the windows version of both the c3.2xlarge and r3.2xlarge instances. There other ways to save, such as paying for these servers as “reserved instances”, which would require upfront payments, but could take the price to as low as $0.35 per hour. As for the non-production environments, some Organisations shut the instances down when not in business hours, and so save up to 2/3 of what the cost would otherwise be (these instances run only for 8 hours a day).
Some of these cost-reducing strategies may be too aggressive for some, and one could criticize the replacement of Windows with Linux servers as a cost-increasing decision because technical support and expertise for Windows is a lot more common and therefore cheaper. Others could argue that using reserved instances defeat some of the purpose of opting for a cloud infrastructure in the first place, since the upfront payment brings the costs (or part of the costs) back as capital expenses, as opposed to operational expenses.
Surely there is not a perfect formula for all organisations, and some will prefer to pay more to keep the Operating System (OS) that matches their internal policies and keep all their cloud costs as operational, but others might realise it could pay – as the costs and savings are so transparent – to upskill their internal personnel and make use of the cheaper Operational System alternatives (or, if they rely on third-party Consultants, it is usually expected they can operate in Linux environments), and also that by sacrificing some political capital to get an exemption for using some capital expense budget to offset half the cost, the savings in the long run would be significant (over 50% in the example above).
 Idem, p. 16.
 Idem, p. 18.
 AWS. Implementing SAP Solutions on AWS Guide. http://awsmedia.s3.amazonaws.com/SAP_on_AWS_Implementation_Guide_v3.pdf
 AWS name for hosts.
 Elastic computing unit.
 A factor of “~285” is also suggested by MISSBACH, M; et al. SAP on the cloud. Berlin; New York, 2013. Page 126.