Introduction to Memcached

Former Member · ‎07-14-2010

Question: With respect to Web Architecture, what is common to these high volume sites: Facebook, Yahoo, LiveJournal, Amazon, Wikipedia, YouTube, Twitter, FarmVille?(and a host of others)

Answer: They all use Memcached.

What is Memcached?

Memcached is a high performance, open source object caching system. Memcached is used on most high performance, large scale web sites. What sets apart Memcached is the simplicity of its design and operation. It is highly scalable (facebook has over 200+ dedicated Memcached servers and handles about 5 million concurrent users).

Memcached Server

At its core a Memcached sever is an in memory cache that stores key-value pairs (a LRU cache). The server protocol is pretty simple and you can obtain clients for most popular programming languages.

Memcached protocol handles two kinds of data text and unstructured data (values). The server stores/retrieves unstructured data as a byte stream.

Data stored in Memcached is identified by a key (max 250 chars). Data is stored or retrieved from the server(s) using storage or retrieval commands. The data being stored has a limit of 1MB.

Scaling

When multiple Memcached instances are used, the individual instances are not aware of each other. There is no synchronization between the instances. Due to this property, it is easy to add new instances and increase cache capacity almost instantly (the clients need to be aware of the new instances). As the official documentation states, the smarts is half in the Client and half in the server.

Memcached Clients

It is a good idea to check out the capabilities of the client you are going to use. All clients usually implement a set of basic features and some extra features. By and large most of them support:

Hashing keys across multiple servers
Consistent Hashing
Storage of Strings or Binary Data
Storage of complex data structures (with some exceptions)
Data Compression
Standard fetch (Get)
Multi-Get

One thing to keep in mind though is that different clients might serialize data differently or hash keys differently.

Commands

The clients send commands to the Memcached instance. The command could be to store, retrieve or delete a key. Data serialization is up to the client.

There are three types of commands that a Memcached instance will accept: Storage, Retrieval and Statistics. Stat commands returns settings and status of the various objects in the instance.

All these commands can be executed via a client or simply by opening a telnet session to the memcached port.

For example by opening a telnet session to my Memcached instance and executing the stats command, outputs the following:

stats
STAT pid 1092
STAT uptime 7919
STAT time 1279045424
STAT version 1.2.1
STAT pointer_size 32
STAT curr_items 0
STAT total_items 0
STAT bytes 0
STAT curr_connections 1
STAT total_connections 2
STAT connection_structures 2
STAT cmd_get 0
STAT cmd_set 0
STAT get_hits 0
STAT get_misses 0
STAT bytes_read 13
STAT bytes_written 7
STAT limit_maxbytes 67108864
END

Supported Operating Systems

The primary OS for Memcached is Linux, though you can get it for Mac OS X, Solaris and Windows. Link to the windows binary is supplied at the end of the article.

Java Clients

Currently there are 2 main Java Memcached clients.

The API's for both are pretty clean and are easy to use. Read through the API and general documentation and decide which one to use.

Here are a few snippet's from the spyMemcached examples.

MemcachedClient c=new MemcachedClient(
 new InetSocketAddress("hostname", portNum));
// Store a value (async) for one hour
c.set("someKey", 3600, someObject);
// Retrieve a value (synchronously).
Object myObject=c.get("someKey");

Here is an example of get to multiple instances of memcached.

// Get a memcached client connected to several servers
MemcachedClient c=new MemcachedClient(
        AddrUtil.getAddresses("server1:11211 server2:11211"));

// Try to get a value, for up to 5 seconds, and cancel if it doesn't return
Object myObj=null;
Futuref=c.asyncGet("someKey");
try {
    myObj=f.get(5, TimeUnit.SECONDS);
} catch(TimeoutException e) {
    // Since we don't need this, go ahead and cancel the operation.  This
    // is not strictly necessary, but it'll save some work on the server.
    f.cancel(false);
    // Do other timeout related stuff
}

For a list of clients for other programming languages check this page.

WebDynpro Java and Memcached

I intended this article as a general Web Development article and not how to get Memcached and WebDynpro working together. I have used spyMemcached from WD in a non production environment (mainly test programs).

In WD most of the time results from RFC's end up in nodes. I have not experimented with serializing/de-serializing nodes to and from Memcached. I do feel that most applications will benefit from caching results from backend systems and transitional data, if implemented correctly. If you do have any thoughts regarding this matter, I would love to know and will update this article accordingly.

Credits

I have to credit three very good resources that helped me with understanding Memcache:

Google Code : Memcache Wiki
Tangent.Org : Memcache Study (PDF Presentation)
Sudarshan Archarys's Blog : Using Memcached with Java