From the Archives: The Second Deadly Sin

JasonHinsperger · ‎03-25-2015

In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2012, Glenn talks about concurrency control and the consequences of using the various options available with SQL Anywhere.

Back in 2011 I wrote an article entitled "The seven deadly sins of database application performance" and I followed that introductory article in April 2011 with one regarding the first "deadly sin" that illustrated some issues surrounding weak typing within the relational model.

In this article I want to discuss the implications of concurrency control and, in particular, the tradeoffs in deciding to use the weaker SQL standard isolation levels READ UNCOMMITTED and READ COMMITTED.

Contention through blocking

Most commercial database systems that support the SQL Standard isolation levels [3] of READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE use 2-phase locking (2PL), commonly at the row-level, to guard against update anomalies by concurrent transactions. The different isolation levels affect the behaviour of reads but not of writes: before modifying a row, a transaction must first acquire an exclusive lock on that row, which is retained until the transaction performs a COMMIT or ROLLBACK, thus preventing further modifications to that row by another transaction(s). Those are the semantics of 2PL.

Consequently, it is easy to design an application that intrinsically enforces serial execution. One that I have written about previously - Example 1 in that whitepaper - is a classic example of serial execution. In that example, the application increments a surrogate key with each new client to be inserted, yielding a set of SQL statements like:



UPDATE surrogate SET @x = next key, next key = next key + 1 WHERE object-type = 'client';

INSERT INTO client VALUES(@x, ...);

COMMIT;

Since the exclusive row lock on the 'client' row in the surrogate table is held until the end of the transaction, this logic in effect forces serialization of all client insertions. Note that testing this logic with one, or merely a few, transactions will likely fail to trigger a performance problem; it is only at scale that this serialization becomes an issue, a characteristic of most, if not all, concurrency control problems except for deadlock.

Hence lock contention, with serialization as one of its most severe forms, is difficult to test because the issues caused by lock contention are largely performance-related. They are also difficult to solve by increasing the application's degree of parallelism, since that typically yields only additional waiting threads, or by throwing additional compute power at the problem, for, as sometimes stated by my former mentor at Great-West Life, Gord Steindel: all CPUs wait at the same speed.

Why not lower isolation levels?

With 2PL, write transactions block read transactions executing at READ COMMITTED or higher. The number, and scope, of these read locks increase as one moves to the SERIALIZATION isolation level, which offers serializable semantics at the expense of concurrent execution in a mixed workload of readers and writers. Consequently it is logical to tradeoff the server's guarantee of serialized transaction schedules with better performance by reducing the number of read locks to be acquired, and hence reduce the amount of blocking - a strategy that makes sense for many applications with a typical 80-20 ratio of read transactions to write transactions.

But this tradeoff is not free; it is made at the expense of exposing the application to data anomalies that occur as the result of concurrent execution with update transactions. But this exposure is, again, very hard to quantify: how would one attempt to measure the risk of acting on stale data in the database, or overwriting a previously-modified row (often termed the "lost update" problem)? Once again, the problem is exacerbated at scale, which makes analysis and measurement of this risk difficult to determine during a typical application development cycle.

Some recent work [1] that explores these issues was on display at the 2012 ACM SIGMOD Conference held last week in Phoenix, Az. At the conference, graduate student Kamal Zellag and his supervisor, Bettina Kemme, of the School of Computer Science at McGill University in Montreal demonstrated ConsAD, a system that measures the number of serialization graph cycles that develop within the application at run time - where a cycle implies a situation involving either stale data, a lost update, or both. A full-length paper [2] presented at last year's IEEE Data Engineering Conference in Hannover, Germany provides the necessary background; here is the abstract:

While online transaction processing applications heavily rely on the transactional properties provided by the underlying infrastructure, they often choose to not use the highest isolation level, i.e., serializability, because of the potential performance implications of costly strict two-phase locking concurrency control. Instead, modern transaction systems, consisting of an application server tier and a database tier, offer several levels of isolation providing a trade-off between performance and consistency. While it is fairly well known how to identify the anomalies that are possible under a certain level of isolation, it is much more difficult to quantify the amount of anomalies that occur during run-time of a given application. In this paper, we address this issue and present a new approach to detect, in realtime, consistency anomalies for arbitrary multi-tier applications. As the application is running, our tool detect anomalies online indicating exactly the transactions and data items involved. Furthermore, we classify the detected anomalies into patterns showing the business methods involved as well as their occurrence frequency. We use the RUBiS benchmark to show how the introduction of a new transaction type can have a dramatic effect on the number of anomalies for certain isolation levels, and how our tool can quickly detect such problem transactions. Therefore, our system can help designers to either choose an isolation level where the anomalies do not occur or to change the transaction design to avoid the anomalies.

The Java application system described in the paper utilizes Hibernate, the object-relational mapping tooklit from JBoss. ConsAD is in two parts: a "shim", called ColAgent, that captures application traces and implemented by modifying the Hibernate library used by the application; and DetAgent, an analysis piece that analyzes the serialization graphs produced by ColAgent to look for anomalies. In their 2011 study, the authors found that the application under test, termed RuBis, suffered from anomalies when it used Hibernate's built-in optimistic concurrency control scheme (termed JOCC in the paper), 2PL using READ COMMITTED, or (even) PostgreSQL's implementation of snapshot isolation (SI). This graph, from the 2011 ICDE paper, illustrates the frequency of anomalies for the RUBiS "eBay simulation" with all three concurrency-control schemes. Note that in these experiments snapshot isolation consistently offered the fewest anomalies at all benchmark sizes, a characteristic that application architects should study. But SI is not equivalent to serializability, something other authors have written about [4-7] and still causes low-frequency anomalies during the test.

The graph is instructive in not only illustrating that anomalies occur with all three concurrency control schemes, but that the frequency of these anomalies increase dramatically with scale. Part of the issue lies with Hibernate's use of caching; straightforward row references will result in a cache hit, whereas a more complex query involving nested subqueries or joins would execute against the (up-to-date) copies of the row(s) in the database, leading to anomalies with stale data. As such, these results should serve as a warning to application developers using ORM toolkits since it is quite likely that they have little, if any, idea of the update and/or staleness anomalies that their application may encounter when under load.

It would be brilliant if Kamal and Bettina expanded this work to cover other application frameworks other than Hibernate, something I discussed with Kamal at length while in Phoenix last week. Hibernate's mapping model makes this sort of analysis easier than (say) unrestricted ODBC applications, but if it existed such a tool would be very useful in discovering these sorts of anomalies for other types of applications.

References

[1] K. Zellag and B. Kemme (May 2012). ConsAD: a real-time consistency anomalies detector. In Proceedings of the 2012 ACM SIGMOD Conference, Phoenix, Arizona, pp. 641-644.
[2] K. Zellag and B. Kemme (April 2011). Real-Time Quantification and Classification of Consistency Anomalies in Multi-tier Architectures. In Proceedings of the 27th IEEE Conference on Data Engineering, Hannover, Germany, pp. 613-624.
[3] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil (May 1995). A critique of ANSI SQL isolation levels. In Proceedings of the ACM SIGMOD Conference, San Jose, California, pp. 1-10.
[4] A. Fekete (January 1999). Serialisability and snapshot isolation. In Proceedings of the Australian Database Conference, Auckland, New Zealand, pp. 201-210.
[5] A. Fekete, D. Liarokapis, E. J. O'Neil, P. E. O'Neil, and D. Shasha (2005). Making snapshot isolation serializable. ACM Transactions on Database Systems 30(2), pp. 492-528.
[6] S. Jorwekar, A. Fekete, K. Ramamritham, and S. Sudarshan (September 2007). Automating the detection of snapshot isolation anomalies. Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, pp. 1263-1274.
[7] A. Fekete, E. O'Neil, and P. O'Neil (2004). A read-only transaction anomaly under snapshot isolation. ACM SIGMOD Record 33(3), pp. 12-14.