From the Archives: The Myth of Linear Scalability
This post was originally written by Glenn Paulley and published to sybase.com in May of 2008.
One of the pitfalls Ivan Bowman and I outline in our white paper, Capacity Planning with SQL Anywhere, is the myth of perfect scalability: that simply improving the hardware platform by a factor of x will correspondingly improve application performance. It is helpful to remember that very few algorithms (or applications) scale linearly; one should always keep in mind the axiom that all CPUs wait at the same speed.
As an example, consider a database design in which several of the most important business objects are identified using surrogate keys. The application is designed to use a special table in the database, which we simply call the surrogate table, where each row in the table represents a type of business object. Contained within each row of the surrogate table is the value of the next key to be used when inserting a new business object of that type – we assume that value is stored in the column next_key. Within the application, the logic for inserting a new client into the database is roughly as follows:
DECLARE @next integer; UPDATE surrogate SET @next = next_key, next_key = next_key + 1 WHERE object-type = â€˜clientâ€™; INSERT INTO client VALUES(@next, ...); COMMIT;
Aside: notice the SQL Anywhere language feature that permits assignment of a variable within the SET clause of an UPDATE statement.
This code snippet illustrates a common application development practice for inserting new information to a database – I have seen this several times in customer applications. However, this code fragment will cause the serialization of every insertion of new client rows, as only one connection at a time will be able to acquire a write lock on the “client” row of the surrogate table. It may be the case that this logic will perform adequately with a small number of connections, but this approach will not scale: with significant numbers of users the formation of convoys is inevitable. With SQL Anywhere, due to its threading model, the usual outcome with this type of convoy is thread deadlock: M threads are waiting for the same write lock on the surrogate table, and the last available thread, doing the same INSERT, is about to block.
This is a great example to illustrate the myth of perfect scalability, because increasing the CPU speed with this type of application will yield virtually no improvement. It also can be a challenging exercise to diagnose these types of problems, because it is harder to diagnose a problem when the expected behaviour isn’t taking place at all: looking at Performance Monitor I/O or CPU counters isn’t going to help you find it.
The point I’m trying to make is that database application performance, while dependent to a degree on the specifics of the database system, is also largely dependent upon the logic within the application, and how well that logic scales with additional users or data volume. This is why there are rarely quick, easy answers to capacity planning questions. The alternative to hardware capacity planning is over-specification: pessimistically purchasing greater hardware performance than truly necessary, so that even peak workloads will not suffer.