From the Archives: The Perils of Simplistic Benchmarks
This post was originally written by Glenn Paulley and posted to sybase.com in October of 2008.
I am often made aware of attempts by customers to “benchmark” SQL Anywhere. The vast majority of these attempts are simplistic, single-data-point performance tests that are completely unrepresentative of the customer’s actual workload, and so the “benchmark” yields little, if any, insight into the performance characteristics of the customer’s application. Moreover, industry-standard benchmarks, such as TPC-C and TPC-D, are, too, often unrepresentative of commercial workloads . In their 2001 study, Hsu, Smith and Young compared the characteristics of 14 captured workloads from ten large IBM mainframe DB2 customers with those of the TPC benchmarks:
Our analysis indicates that in some cases, the TPC benchmarks fall reasonably within the range of real workload behavior, and in other cases, they are not representative of real workloads. Some of our findings are (1) TPC-C tends to have longer transactions and fewer read-only transactions than the production workloads, whereas some of the transactions done by TPC-D are much longer but are read-only and are run serially, (2) the production workloads have I/O demands that are much more bursty than the TPC benchmarks, (3) unlike TPC-C, which has very regular transactions, and TPC-D, which has long queries that are run serially, the production workloads tend to have many concurrent and diverse transactions, and (4) TPC-C has no I/O activity involving temporary objects, whereas most of the references for TPC-D are directed at index objects.
The scale of the benchmark also has fundamental effects on a server’s performance characteristics. In a 2001 talk entitled “Impact of Database Scaling on DSS Workload Characteristics on SMP Systems” (available from the University of Illinois at Urbana-Champaign), Ramendra K. Sahoo, Krishnan Sugavanam, and Ashwini Nanda of the IBM T.J. Watson Research Center concluded that:
- Invalidation requests and modified interventions on the memory bus decreases moderately with increase in database size.
- DSS workload characteristics vary significantly with DB size.
- CPU Cache misses per instruction can vary three fold between a 10GB and 100GB database.
- Due to the large variations in workload characteristics, it is important that realistic DB sizes are used for system evaluations.
Memory bus invalidation requests and instruction- and data-cache misses are important CPU utilization effects because the latency they introduce are independent of CPU speed.
There are a variety of studies that have analyzed CPU performance with respect to database benchmarks; one is by Hankin et al . I’d like to highlight another of these: In 2003, Natassa Ailamaki, Minglong Shao, and Babak Falsafi of Carnegie-Mellon University (Natassa is now at EPFL in Lausanne, Switzerland) wrote a Technical Report  describing DBmbench. DBmbench evaluates the performance of a simple workload consisting of scan and index nested-loop join operators on a small, in-memory copy of (either) the TPC-C or TPC-H database. The authors claim that the results of the DBmbench benchmark at different scales mimics acutal TPC-C or TPC-H performance reasonably well, but is significantly simpler to implement. At a minimum, what DBmbench does do is offer an idea of relative hardware performance (CPU, cache, and main memory) for database-like workloads when run on different systems.
Of course what DBmbench does not do is offer any insight into the quality of the access plans generated by the system’s query optimizer, the (vitally important) analysis of I/O performance, or the performance characteristics of the actual database application. However, simple tools such as DBmbench do offer easy-to-implement performance evaluation tests that, in combination with other tests, can yield an understanding of hardware performance without necessarily resorting to the effort of creating an application-specific benchmark.
 W. W. Hsu, A. J. Smith, and H. C. Young (2001). Characteristics of production database workloads and the TPC benchmarks. IBM Systems Journal 40(3), pages 781-802.
 Richard A. Hankins, Trung A. Diep, Murali Annavaram, Brian Hirano, Harald Eri, Hubert Nueckel, and John P. Shen (December 2003). Scaling and characterizing database workloads: Bridging the gap between research and practice. In Proceedings of the 36th International Symposium on Microarchitecture. IEEE Computer Society Press.
 Anastassia Ailamaki, Minglong Shao, and Babak Falsafi (June 2003). DBmbench: Fast and Accurate Database Workload Representation on Modern Microarchitecture. Carnegie-Mellon University Technical Report CMU-CS-03-161, 23 pages.