Browsing through closed support messages while looking for a possible approach to a certain Oracle CBO problem I came across a quite problematic issue.
Some support colleagues sometimes seem to focus too much on removing the symptoms of a software defect rather than analysing it correctly and forwarding it to the development team so that they create a fix for it.
Of course there is reason behind this kind of message handling: the way the performance of support consultants is evaluated motivates them to focus on message closings.
Unfortunately a closed support message does not equal a solved problem, as we all know.
In the second best case (say, solving the real problem is the best possible message outcome), the support consultant will be able to present a fix for the issue.
Something that makes the effects of the problem disappear or make them less critical.
For quite a number of Oracle performance problems, where the CBO decided for a wrong execution plan, such a fix would be to manipulate the optimizer statistics.
SAP note #724545 explains how this can technically be achieved and it’s pretty simple once you’ve done it a few times.
Provide a fix – not that easy…
However, the critical point here is not to know how to perform this manipulation but to know when to use this tool and to know which of the optimizer statistic values needs exactly what tweak to lead to the desired result.
One pretty common kind of manipulation is to change the number of distinct values for a certain table column or to reduce/increase the clustering factor of an index.
To be able to figure out what exact value is causing the optimizer to choose the wrong plan is on the other hand pretty difficult.
Of course there is some documentation available about the calculations the optimizer performs (roughly) and one may also employ the quite readable optimizer trace (event 10053).
Still, to understand the reasons for the optimizer decisions keeps being complex, especially once it is about a multi-table-view-join construct.
Provide a fix that is actually usable – pretty difficult
Now we’ve this toolset to change the optimizers input (the statistics) and we’ve a not so good understanding of how the optimizer works with this input.
Of course the support consultant will likely try a out a bit until the execution plan for the statement in question looks fine.
The message is then usually returned to the customer with some notes on what was done and that now it should be checked whether the response time of the report has improved.
The big mistake.
Let’s assume that the modifications actually did what they were supposed to.
The next thing that happens is: the customer confirms the message.
What’s wrong with this?
The problem is: now a symptom of a specific complex problem, maybe a product error (a bug) in Oracle, has been removed. Note: just the symptom.
The error is still in the system and nobody is looking after it to think about a solution.
Even worse, statistics manipulation are very difficult to maintain in the operations of multiple systems or even service providers.
Until lately there wasn’t a standard approach to keep and document these statistic manipulation.
Recently the DBSTATC table usage has been ‘enhanced’ to handle such settings as well (check note #892296 for details on that).
Anyhow, for the user it will always bring much additional effort to keep manipulated statistics.
The changes need to be documented, eventually re-applied and re-evaluated with the next patches that get installed in the system. And all the time unexpected side-effects could occur (e.g. other statements that used to work OK now run too slow).
For the support it means that a problem that already happened to occur some times wasn’t analysed correctly yet. That means more work and less time to focus on new issues.
Don’t go for quick fixes
Therefore, as a customer you shouldn’t confirm the problem just because the symptom is gone.
Ask for a solution instead.
As a support consultant you should remember that a fix just isn’t a solution and shouldn’t be presented as such.
If there is a good reason for the statistics manipulation (e.g. extremely volatile data volumes so that it’s hard to gather representative statistics), then SAP should and will provide a note with these manipulations.
If the reason for the manipulation is not so clear, then it’s very likely that you’re facing a bug here that needs to be identified and solved.
In many cases it may be sufficient to increase the sample size for the statistic gathering or to install the latest Oracle patches.
Anyway, it has to be clear which bug you hit and how the fix works and of course how the bug will eventually be really fixed.
For those interested in the technical side of these topics, you may want to check the following notes:
892296 Enhancements in update statistics in BRCONNECT 7.00/7.10
106047 DB21: Customizing the DBSTATC
724545 Adjusting the CBO statistics manually using DBMS_STATS
1020260 Delivering Oracle statistics
176754 Problems with CBO and RBO
766349 FAQ: Oracle SQL optimization
588668 FAQ: Database statistics
756335 Statistics in tables w/ heavily fluctuating volumes of data
122718 CBO: Tables with special treatment
Book and blog/website of Jonathan Lewis: http://jonathanlewis.wordpress.com/