How SAP Cloud for Customers could be affected by network problems
SAP Cloud for Customer as a Software as a Service solution is accessible over the Internet from anywhere over the world by either browser, mobile applications, web services, and others,
The user requests from the browser or others have to travel through multiple networks including the user private network, ISP public network, Internet public network and SAP Datacenter network to get to the Cloud for Customer servers to be process.
As the number of components, networks and geographies increase, the probability of network failures increase as well and with this the end user experience or performance of the SaaS solution could be affected.
Different network conditions could influence the performance of Cloud for Customers, they could be consolidated in the following two areas:
- Overhead caused by network conditions. In this case multiple factors either cause by hardware or software problems, or actual overload or even operator errors could present different type of symptoms, by example but not limited to:
- High Network Latency. First lets define network latency, it is the time that it takes to send one network package to get to the receiver, which also in some case is the time that it takes for the package to return to the sender and in this case is called round trip time or RTT.
In SAP Cloud for Customers we have in-tenant embedded tools which helps to measure latency, details on how to use this could be found here. There could be multiple reasons because high network latency and some of those are:
- Inefficient path. In the Internet, routing policies and protocols are based in number of factors, which sometimes are not related with performance. The protocol, which makes the Internet works, is called BGP (Border Gateway Protocol), but it is not the only one. BGP is a protocol that exchange routing information between gateways as known host that can be reached, the exchanged information could contain cost associated the path and some other attributes that are used to determine the best available route. This protocol like some others, allow high level of configuration which could cause inefficiencies in the routing, which finally could cause the network packages to travel longer or through busier networks which finally will cause high network latency
- Distance. Network packages have to travel from one location to another and they could travel at the speed of ~124000 miles per seconds, around 62 miles in millisecond for a RTT, the more the package has to travel the longer it will take and the higher the network latency will be, pure physics
- Busy networks. Queuing effects could be observed during the travel of the package over the network, usually over the Internet the package has to travel over several public and shared network components, which are interconnected, one busy link could be the main factor determine the increase in network latency. When a package arrives to a network component (router, switch, etc.), the package has to be process and re-transmitted, because the network component has a limit of package(s) that can be process and if the packages arrive faster that they can be process, they will be put into a queue and the processing time will increase. Sometimes when those queues get over flooded, network packages get discarded which means “package loss” and within the TCP protocol, the packages will need to be retransmitted, this behavior could cause a domino effect since now multiple devices will have to process and queue more packages. In general, high latency and high package loss together could cause severe slowdowns in network communications. There is another factor called jitter, when there is too much variation in network latency the jitter will increase and this will be a sign of problematic network connections (jitters is defined as the difference between network latency from package to package)
- Low Network Bandwidth. Bandwidth refers to how much data can be transfer from one point to another in a set amount of time. The bigger the bandwidth the more amount of data can be transfer in less time. The same becomes a factor where multiple applications or users use the “pipe”, here concurrency, the type of application and amount of data that is uploaded or downloaded from the Internet will be the factors that determines how big this bandwidth has to be. If the bandwidth becomes a bottleneck, it will slow down the throughput and with this the response time, think in a highway during rush hour, at that time we might be in a bumper to bumper situation we will be driving at 30 mph, while on the same highway in another moment we could probable drive at 70 mph.
Bandwidth will always has limitation and it is also important to understand that the upload and download available bandwidth could be different. SAP Cloud for Customers has an in-tenant tool which can be used to calculate the bandwidth, more details are here.
- Configuration issues that cause overhead. There are different configurations factors that could cause overhead to the response time, some of those could be but not limited to:
- DNS Configuration. Cloud for Customers uses product from our partner company Akamai to accelerate the traffic over the Internet. This product relies of Geolocation of the DNS server, which resolves the Cloud for Customer DNS tenant name to an IP of an Akamai server geographically close to the DNS server. This is to provide the best routing from that region where the user is connecting to the SAP datacenter. For the user to take full advantage of this feature, it requires to have the DNS server and the user in the same geographical area. A common problem is where the user is using a DNS server which is located in a different geographical region to resolve DNS queries, in those cases the DNS server will resolve the Cloud for Customer tenant DNS name to IP close to the DNS server but not necessarily to the user, examples are where a user from Europe is trying to access a SAP Cloud for Customer tenant in Europe and is using a DNS server in America, in this case the user from Europe will receive an IP from a Akamai server in America forcing the user to connect from Europe to the Akamai server in America to then connect to the SAP Cloud for Customers tenant in Europe, causing considerable network overhead. One method on how to identify if the DNS resolution and TCP routing is happening under the same region is explained in this blog
- Overhead cause by forward proxies. Some forward proxies either OnPremise or in the Cloud have showed to cause overhead while using SAP Cloud for Customers, in most cases we have observed high TCP times, SSL handshake times and sometime even high send or receive times. Some ideas on how to identify these type of issues with a HTTP tracing tool like HTTP Watch is explained here
- Overhead caused by last mile routing. In some cases IP routing is not correct in the last mile, either within the customer private network or from the customer network to the Akamai sever (the last mile), or if Akamai is not enabled to the SAP Data Center. Some ideas on how to identify this problems is explained in this blog.