Web application performance depends on Internet capacity. Money flows into the networks from the first and last miles, as companies pay for hosting and end users pay for access, but in the Internet’s middle mile, there is very little incentive to build out capacity. The longer data must travel through the middle mile, the more it is subject to congestion, packet loss, and poor performance.
For large files, the distance between server and end user is critical. TCP allows only small amounts of data to be sent at a time before waiting for acknowledgments from the receiving end, so throughput is throttled by network latency. If packet loss is detected, TCP backs off and sends even less data before waiting. Longer distances increase the chance of congestion and packet loss.
There are four main approaches to distributing content servers in a content-delivery architecture.
Approach 1. Use mirror locations. Performance and reliability are limited by Internet middle-mile bottlenecks. Also, site mirroring is complex and costly. Expensive infrastructure will sit underutilized most of the time and does not provide the flexibility to handle surges.
Approach 2. Use a content-delivery network (CDN) to offload the delivery of cacheable content from the origin server onto a shared network. This approach is still throttled by middle-mile bottlenecks.
Approach 3. Use a very highly distributed CDN to deliver content from the right side of the middle-mile bottlenecks. This eliminates peering, connectivity, routing, and distance problems, and it scales.
Approach 4. Use a peer-to-peer (P2P) architecture. But P2P download capacity is throttled by its total uplink capacity, and uplink speeds tend to be much lower than downlink speeds.
A highly distributed architecture provides the only robust solution and enables many optimizations.
Optimization 1. Reduce transport-layer overhead. Use persistent connections and optimize parameters for efficiency. This can improve performance by reducing the number of round-trips.
Optimization 2. Find better routes. With a highly distributed network, you can speed up uncacheable communications and improve reliability by finding alternate routes when the default routes break.
Optimization 3. Prefetch embedded content. Highly personalized applications and user-generated content often involve either uncacheable or long-tail embedded content, where prefetching makes a huge difference in responsiveness.
Optimization 4. Assemble pages at the edge. Cache page fragments at edge servers and dynamically assemble them at the edge in response to end-user requests. This offloads the origin server and results in much lower latency to the user.
Optimization 5. Use compression and delta encoding. Compressing content and sending only the difference between a cached HTML page and an updated page can greatly reduce the amount of content traveling over the middle mile.
Optimization 6. Offload computations to the edge. Large classes of popular applications are well suited for edge computation.
These optimizations work in synergy. TCP overhead is designed to guarantee reliability in the face of unknown network conditions. Because route optimization gives high-performance, congestion-free paths, it enables further optimizations.
The guiding philosophy for building and managing a robust, highly distributed network is to expect failures at all times. The network should work seamlessly despite them. Some practical design principles result from this philosophy.
Principle 1: Engineer redundancy in all systems to facilitate failover, with multiple backup possibilities ready to take over if a component fails.
Principle 2: Use software logic to provide message reliability. Use the public Internet to distribute data.
Principle 3: Use distributed control for coordination.This is important both for fault tolerance and scalability.
Principle 4: Fail cleanly and restart. Aggressively fail problematic servers and restart them from a last known good state.
Principle 5: Phase software releases. After quality assurance, release software to the live network in phases, first to a single machine, then via further checks and phases to the entire network.
Principle 6: Notice and proactively quarantine faults. Isolating faults in a recovery-oriented computing system is an area of ongoing research.
In short, a highly distributed approach to content delivery brings benefits.