Benchmarking and the single thread
If you work in tech long enough, you get to see a few cycles and turnovers. I’m not talking about cell phone X to cell phone Y, I’m talking hardware generations. I started running benchmarks long ago, and have written here on a few forays. One of the most recent (relatively) last things I wrote on SCN was Fast Is Not A Number . After nearly 2 years, I still haven’t seen any official SD benchmarks requested there.
Meanwhile, SD benchmarks continue to be certified and published under SAP’s purview, which I and others track. Hardware vendors have fun volleying charges back and forth about the meaning of the numbers and the value to users. I won’t link those here; let me know if simple searches don’t bring them to light. I’ll admit to my biases (like UNIX over Windows, Oracle over another DB), but try to remain nonpartisan when evaluating one vendor choice over another. While the test results being published here are about Oracle database software (hence this space versus an application software space versus a hardware space), they could be extrapolated to other databases and software on differing hardware. With all the tests I’m running, these benchmarks make sense to discuss on SCN.
More preamble – the results shown below are metrics of transactions per second. That is but one dimension one can use to decide which solution is acceptable. While I cannot and will not discuss license terms, I think it is well known that CPU based counts are often used by vendors. Sometimes it’s based on hardware class, so larger systems cost more per user, but in some cases you need to control costs by restricting where certain code can run. Put it another way, any hardware performance increase or tweak that doesn’t tick over the the software license meter is worth looking into.
Intel, IBM, and other chip designers have gone to higher parallelism, in varying ways, not to mention the virtualization and grid techniques that expand problem solving and number crunching to new dimensions. Here, I’m only looking at IBM Power CPUs (I have benchmarked other CPUs as they were made available to me). I’ve used some variety of these since Power4. I’ve also used the software package called “orabm” for about 10 years.
What is “orabm”? This cool application was produced, and is still available, from linxcel, which has also released a more comprehensive transaction throughput monitoring suite they call “orastress!!” (I always forget if it has one or 2 exclamation marks). To prepare this suite, run SQL statements to create and load the sample tables, then run the compiled benchmark controller. I have not built the module in a while, but I don’t think any compiler changes will affect the results.
No doubt I’ve left out details some might find pertinent, as once you get started on 20 or 30 separate test modules there may be assumptions left unstated. I’ll respond in the comments as best I can, and will try to explain any omissions or unanswered threads.
Back to the title ‘Single threaded” – does threading help? Not just in benchmarks, but in actual production code? I have to ask, since the CPU specs these days throw around a lot of terms, from hyper-threading to symmetric multi-threading to other parallel turns of phrase. I’ll explain the implications after I review the numbers. Naturally, there is a spreadsheet behind this, and no, you can’t have it.
First stipulation – I ran several iterations with varying numbers of parallel sessions being created. The performance results can vary depending on adapter settings, network stack design (not just the OS, but the hypervisor too), etc. The suite reports transactions per second for each parallel run. There’s a shell script that converts the parallel results into transactions per second overall. That’s because we want to know the big picture.
Earlier results are for Power5 and Power6 CPUs. The most recent tests are from the same system starting on a Power7 frame (salmon-colored dots), which was then moved to running on Power8 (light blue). No reboot happened (which is great for, um, non-disruption). The TPS went from roughly 1,500 to over 2,000. As it turned out, that partition was still in Power6 mode, where it had been running only last year.
The planned system reboot activated more CPU features, though not everything. The larger blue diamonds show another bump up, without an OS change/patch, and without changing anything except the SMT flag (from “2/Active” to “4”). Here, the TPS are over 3,000, with one blip close to 3,500 (that’s the kind of outlier I’d throw away once I had more repetitions).
The top 2 sets of results are after an operating system upgrade (from AIX 6 to 7). Another perceptible performance bump, to the 4,000 TPS range. One set was with SMT-4, the other with SMT-8. Though there may be a minor performance benefit to the latter visible in this test, I don’t see this as significant. Whether other code with higher threading architected will benefit remains to be seen. No predictions from this groundhog on that front.