This will be the first in a series of posts that examine different architectures for running SQL Anywhere databases using components found in Infrastructure-as-a-service (IaaS) offerings. These components could either be supplied by publicly available IaaS providers (such as Amazon and Rackspace), or your own private cloud (RAID disk arrays, virtual machine servers, etc). The goal of this series of posts will be to explore different ways that the IaaS components can be combined to create SQL Anywhere database systems that exhibit different trade-offs in the areas of cost, durability, redundancy, and performance.
Before diving into the architectures, we must first take a look at the basic components and concepts that we will be using to assemble the systems.
Components and Concepts
The virtual machine is the basic unit of computing in the system. Virtual machines come in two flavours: non-persistant and persistent.
A non-persistant machine does not persist +anything +after the virtual machine is stopped. Everything on the machine that was modified while it was running is lost. Data that needs to be persisted must be moved off of the machine and onto another storage medium (virtual disk, archival storage, etc) before the machine terminates. It is not possible to stop and restart a non-persistant machines because everything about the machine ceases to exist after shutdown.
In contrast, a persistent machine is backed by permanent storage (likely a virtual disk) that continues to exist after the machine is stopped. This allows the machine to be restarted into the same state that it was in the last time it was stopped.
Virtual machines are susceptible to similar underlying hardware failures and disaster scenarios as physical machines. We will need to plan for virtual machine failures in our architectures.
A virtual disk is size-configurable, permanent, block-level storage that can be mounted to a running virtual machine. Virtual disks are capable of random I/O. Virtual disks can only be mounted to a single virtual machine at any time, but can be mounted to multiple machines throughout its lifetime. Virtual disks continue to exist (unmounted) even after the virtual machine they were mounted to has shut down.
Similar to virtual machines, virtual disks are still susceptible to some hardware failures and disaster scenarios. We will need to plan for virtual disk failures in our architectures.
Despite the moniker “virtual”, the machines and disks do physically exist somewhere. The geographic region is the place where the resources powering these virtual entities physically reside. The geographic region becomes important because of the laws of physics still apply in the cloud. The distance between your virtual components becomes important when discussing latency, and disaster situations.
The size of a geographic region will vary between clouds. A large public cloud such as Amazon uses geographic regions that are thousands of miles apart. A smaller private cloud may use geographic regions that represent servers in buildings that across the street from each other. In general, the farther apart the geographic regions, the more isolation and latency between them.
Failure-insulated zones are a sub-divisions of the geographic regions. Geographic regions are useful when thinking about large-scale failures and disasters (earthquakes, explosions, hurricanes) that cover a large area. Failure-insulated zones are divisions within a geographic region that are, as much as possible, isolated from expected localized failures such as disk or power supply failure.
Similar to geographic regions, exactly how isolated these zones are will vary. A large public cloud may use zones that are located in rooms separated by thick firewalls (physical ones, not the networking kind). A smaller private cloud may use zones as simple as two racks that sit beside each other. As with geographic regions, the more isolation the better.
Archival storage is long term, permanent, blob-level storage. Archival storage allows the storage and retrieval of individual blobs, but does not allow random I/O within the blobs. Archival storage is not mounted to any virtual machine, and can be accessed by multiple virtual machines at the same time.
Archival storage exists outside of any specific geographic region. It is considered to be completely durable, but not always available.
Components as Found in Amazon Web Services
Although these components can be found (to varying degrees) in all public and private clouds, we will be using the Amazon Web Services’ components and pricing when comparing architectures.
The Amazon Web Service components that align with our definitions are:
- Virtual Machine: Amazon Elastic Cloud Computing (EC2)
- Non-Persistent: Instance-store Amazon Machine Instances
- Persistant: Elastic Block Storage-backed Amazon Machine Instances
- Virtual Disk: Amazon Elastic Block Storage (EBS)
- Geographic Regions: Regions
- Failure-insulated Zones: Availability Zones
- Archival Storage: Amazon Simple Storage Service (S3)
Over the next few posts, we will combine these components together to build systems that exhibit different trade-offs in the area of cost, durability, redundancy, and performance.