It isn’t easy evaluating hardware. When you go to look for a new or used car, you read the published specs, maybe in those glossy handouts, or maybe in a Consumers Report or EPA type site. What you really want to do is get your hands on the machine, drive it a fast speeds, around corners, and see what it can do.
With computers, much the same thing happens, with published benchmarks like SAPs ratings, third party testing protocols such as the Transaction Processing Performance Council (TPC for short). As with cars, I try to get hands-on as well. I can work with hardware delivered to my site, but more often, I get the chance to work on remote systems. For this week and next, I am running tests on a really big machine, in an undisclosed site, using a vendor I won’t name. The savvy among you should be able to guess.
I run as many pieces of code as I can in a short time, plus a few overnight tests, to maximize the number of results. This helps me gain confidence in overall performance, as well as possible isolate peaks or troughs. I’ll talk about a few Java tests I am running, and maybe you can suggest others.
I can’t publish my results due to NDA, but I have no problem describing where I got the code, why I picked them, and how I run the tests. If you have results you can publish, perhaps on older hardware, I should be able to find equivalent results in my archives, going back almost 10 years.
I found this code here several years ago:
It might be old, but that’s okay with me. It’s likely we might have old code around, if not from SAP, then from third party vendors that don’t generate a new release you need to install every few months.
The tests have a huge number of results online here:
It’s quite easy to run (this is from the source web site):
You can see this meets my criteria for free source code, easy to run, and with a published set of results to compare. Some may argue these don’t represent their workload, which I understand, but I believe more data in these decisions are better than fewer data.
This is a suite of tests, including basic and advanced math algorithms. Source should be available here:
For those unfamiliar with U.S. government agency acronyms, this is the National Institute of Standards and Technology. Just the kind of place to go looking for rigorous repeatable test protocols.
For the last couple years, I’ve added the “large matrix” test to my suite, as per the suggestion: “You can also find out how to run our special LARGE version of SciMark 2.0 for out-of-cache problems.”
It’s been challenging keeping up with cache technologies, not to mention the parallelism, threading and multi-cores. The scope of those tests is way beyond this blog. Maybe after I get some results I can sanitize…
Two other lesser, but still fun tests I like to run are:
Run these, e.g.:
|time java -classpath pidigits.java_run pidigits 1000 >/dev/null|
The Debian.org site where these are hosted says the tests are now deprecated, and I’ll keep running the old tests just to see what happens, while I download and evaluate the new ones. The problem with changing test cases is that one can’t go back in time to run things on prior generations, not to mention that with virtualization, all of the machines in our data center are considered production.
New test link:
And what would a blog about bencmarks be without a graph? Yes, you can’t see which hardware, CPU, GHz or anything else. But isn’t it colorful?
I will say that I tested with 32 bit and 64 bit Java stacks.
Hope to see you this Friday (June 25, 2010) at the SAP Newtown Square offices for the full day of inside track, including two other ASUG volunteer speakers (new SAP Mentor Tammy Powlas, and Greg Myers), Craig Cmehil, Peter McNulty, Marilyn Pratt, Jon Reed, and many others, organized by Rich Heilman.