Oracle compression, a week out of port
After a week of minor fire-fights and other business as usual, I took a look at several systems that had gone through Oracle database compression and storage improvements. Some performance metrics would be influenced by one or the other or both; space consumption is only from the compression work.
The immediate measurements I’d look at would be performance visible in SM66 (which needs to be reviewed regularly so “normal” is known), or in ST04 SQL cache statements currently executing at different parts of the day, or in the highest SQL (as seen in the 72-hour-after port 72 Hours After Oracle Compression ). After any surprises (not the good ones) are dealt with, I’d examine trending of various metrics to see if anything is going in the wrong direction. Though we’re still looking for anything that ended up with a new SQL plan that was not as good as the old plan, I have the baseline data to comb more finely. On a gross level, I summed time recorded in ST03 workload for background jobs. The Image 1 below shows the grand total over several weeks for dialog steps, and total seconds, CPU, and database time. The DB time is down 20 – 25% for the week, which is a good return for the work absent much specific tuning or rework.
Average redo time has more to do with the storage layer (and I/O stack) than compression, although smaller volumes of logs could perhaps avoid contention that might occur if parallel workload peaks. That would probably require much more data compression that the stock SAP GUI or EarlyWatch type reports provide. The gap in Image 2 is the compression outage (downtime was less than a day, but this chart combines data from 2 different databases); I’d ignore the first spike or explain it as data migration related.
The performance improvement is welcome, so the next question should be how can we improve this even more? We’ll need to look at the I/O layer, the storage channels, and specs for the redo device(s).
Each wait event near the top of the list is worth examining. Different inflections of storage, server hardware, memory caching, and software drivers affect read time of index access, which has been one of our weightier metrics. System event history in ST04 had been between 3 and 4 milliseconds recently; for me the day-to-day variation was also of interest. No simple explanation jumps out but if I had to guess I might say the caches aren’t big enough.
After the migration to new storage, we see both a decrease in the index access time and a smoothing of the daily variation. The chart was slightly trimmed to fit into the SCN blog space – the red is non-production and blue production.
Of the two production systems, this one has slightly better numbers for index reads. The other system has some block contention, showing once one slow area is fixed, another one takes its place.
The last view for now is database space consumption. These numbers are your-mileage-will-vary, based on the last time a system level reorg was done (6 years or so for us), nature of your data (and archiving extent), and maybe some random parameters or storage settings (PCT_FREE e.g.).
In a year, or less, we should see what the revised growth rate is. Archiving could alter the angle of growth as much as anything.