Skip to Content
I have collected 10 points which can be checked for the very first step in your analysis. Here, I am not focusing on a single issue but providing an overview instead.
Automatic Workload Repository (AWR), a licensed feature of Oracle, is one of the best tools to monitor the performance issues. The report can be generated as of Oracle 10.2 on any platform.
There are 4 Important points one should consider before generating AWR:
  1. before/meantime/after reports – create 3 different reports which cover the timeframe during the problem was experienced and compare it with the timeframe before and after.
  2. duration of snapshot – choose a shorter timeframe to get a more precise report.
  3. retention time  – set this to 42 days, though it must be checked if SYSAUX tablspace has enough freespace. Configure the retention period
  4. format – Choose html to be able to use the links, bookmarks inside the report.
The following sections are the most important to be checked first to get an understanding of the performance problem.
1 Header Section contains general information about the timescale and the system environment.
If the timescale is not chosen correctly (not covering the time when the problem encountered) then the analyzing the report will be meaningless.
AWR-header.PNG
2 Cache Sizes – contains information about the SGA (System Global Area) in the beginning and end of the snapshot.
AWR-2.png
3 Load Profile – contains information about the database workload during the snapshot.
AWR-3.PNG
4 Instance Efficiency Percentage – these needs to be looked very carefully, since these are not really a good measurement of the database performance.
For example in very processing-intensive SQL statements which are executed repeatedly, only read blocks from the buffer pool increases the hit rate of the buffer pool. After optimizing such statements the hit ratio decreased though performance improves.

Buffer Nowait – shows how often buffer cache were accessed with no wait time.

Buffer Hit – shows how often a requested block has been found in the buffer cache without requiring disk access.

Redo NoWait – shows if log_buffer size is set correctly. Preemptive redolog switches in Oracle 11.2

Parse CPU to Parse Elapsd – shows how much time was spent on parsing while waiting for resources.

Non-Parse CPU – in the following example the the figure is close to 100% meaning that the overall CPU usage is only 0.15 % for statement parsing.

AWR-3-2.PNG

5 Shared Pool Statisticsshows if there is an overhead on the system regarding shared pool.
The values should not be very high (preferably less than 75%).

AWR-3-1.PNG

6 Top 5 Timed Foreground EventsShows top 5 wait events that are taking the most of time.  The exact meaning of each and every event can be found here.

Based on what we found in here 2 other sections in the report must be checked.

  1. SQL statistics section – check if there is a lot of read and which SQL statements involved.
  2. IO stats section – check if there is I/O bottleneck.

In the following example a long time (70%) is spent for waiting on I/O related reads. There are 12,727 waits in 265 seconds (4 minutes) which is more significant than 3 million waits in 36,388 second (10 hours).

As well only 8% of the time has been spent on DB CPU. If this time is significant check the following points:

  1. Is the CPUT times(s), (in our example 5119 seconds) significant compared to the total CPU time?
    Total CPU time = number of CPU(s) * snapshot time (in seconds). find NUM_CPUS in the “Operating System Statistics” section.
  2. Is there a SQL statement which takes most of the time, if yes check “SQL ordered by CPU Time” section.

AWR-4.PNG

7 SQL ordered by Elapsed TimeShows which SQL statement runs for a longer time. Those statement needs to be focused that have less numbers of execution with high Elapsed Time per Exec (s). More over check if the % total is significant.

AWR-5.PNG

8 SQL ordered by CPU Time – contains information on which SQL statement takes the most CPU time
Total DB time 

AWR-6.PNG

9 &10 IO Stats – Check if read and write of the datafiles /logs is taking longer. Section 6 reveals some information about the waits for I/O, wether this is a good number or not is dependent on the Hardware/OS. If the I/O is slow check “Tablespace IO Stats” and “File IO stats”. “Rd(ms)” columns must not exceed 20 otherwise it worths involving your OS team and hardware vendor to investigate the I/O bottleneck.

AWR 8.PNG

 
 

To report this post you need to login first.

4 Comments

You must be Logged on to comment or reply to a post.

  1. Kay Kanekowski

    Hi Salome,

    thanks for this helpfull posting.
    I think there is a little typo in chapter 6: “There are 12 million waits in 265”. I think there are only 12 thousands waits.
    best regards

    Kay

    (0) 
  2. Stefan Koehler

    Hello,

    ratio based tuning like “the target must be close to 100%“. I thought that we overcame this round about 20 years ago (especially in such a high tech company like SAP).

    “Check the “Av Rd(ms)” columns it must not exceed 20 otherwise involve your OS team and hardware vendor to investigate the I/O bottleneck”


    In your case there are 3.45 average blocks per read, which means that you perform some multi block I/Os that are probably much larger than that. Everybody would be happy to get that average response time with mixed multi block I/Os (e.g. up to 1 MB for most OS platforms). Just look at your second highlighted example of 21.04 ms in average – the average single block I/O performance is 2.16 ms, which is absolutely perfect for disk based storage (and even the 8.09 ms from the previous one are ok).

    Regards

    Stefan

    (0) 
  3. Fidel Vales

    Hi,

    I wanted to wait one day before replying but I see that Stefan has done it better than I would have done.

    I recommend to read the following “recent” to get a better understanding of “ratios”

    https://twitter.com/JLOracle/status/524506108166422528

    and the following video with Graham Wood for a little background about AWR.

    BTW, I do not understand the following:

    duration of snapshot – choose a shorter timeframe to get a more precise report. The duration must not exceed more than 2-3 hours.

    An snapshot, by default, is 1 hour, so 😕

    (0) 

Leave a Reply