[Oracle] Insights into redo handling, preemptive r...

stefan_koehler · ‎10-22-2012

Introduction

Every now and then i receive similar questions from my clients like "Why are my archive logs smaller than my redo log files?" or "I get a lot of "log file switch (private strand flush incomplete)" messages - what is this about?" or you can find similar questions here on SCN as well. So i thought it is time to write a blog about this behavior and share some insights about the (internal) mechanism behind.

SAP has already provided a sapnote #1627481 about the "preemptive redolog switches" topic, but it does not explain the reason for its behavior. I will not cover every sub topic very deeply, because of there are so many details and special cases, but hopefully it will be enough to get the connection.

(Public) redo

At first it is important to know that Oracle uses a "write-ahead logging" approach. This means that a modified data block (for example by INSERT, UPDATE, DELETE) can not be copied to a data file until the description of how the data is changed is written to the redo log files. There are a few exceptions to this rule, but let's disregard these right now. The general rule applies to "in memory data changes" as well (for recovery reasons).

So here is just a simple transaction sequence of changing some data in a table:

Create redo change vector for undo (= Description of how to insert an undo record into the undo tablespace or better said undo block)
Create redo change vector for the data (block) itself
Merge both change vectors into one redo record and copy this record to the log buffer (=memory)
Insert the undo record into the undo block
Change the data block itself

So let's multiply this simple sequence for several concurrent sessions. Each session changes a lot of data and tries to copy its (single) redo record into the log buffer. As we know that the log buffer is just a piece of memory, it needs to be protected by latches (called "redo allocation latch" in this case) and so we have a concurrency issue (=hot spot) right here. This was not an issue in past days with less transactional data volume, but Oracle took notice of this issue in larger environments and developed a feature (in 10g) called "private redo" and "in-memory undo" to avoid that bottleneck. There are several reasons when Oracle fall back to the old (public) mechanism like described above, but let's disregard these right now as well.

Private redo

Private redo is an enhancement to avoid that bottleneck for copying each redo record to the public redo buffer immediately. Basically it works like the simple transaction sequence from above, but instead of copying its redo to the public redo log buffer, it handles the redo in its private redo log buffers and copies all of that private stored redo to the public redo log buffer once when the transaction completes. So here is the sequence of private redo usage:

Allocate memory from the two private areas (in-memory undo and private redo)
Mark the corresponding blocks as "uses private redo", but do not change the block
Create redo change vector for undo and copy it to the in-memory undo area
Create redo change vector for the data block itself and copy it to the private redo buffer (thread)
Merging both private memory areas into one redo record (by end of the transaction)
Copy the redo record and change the data block itself

There are several x$ and v$ views to demonstrate this behavior, but this should be enough for understanding the basic principle. If you are interested into the details, you can examine the views v$latch, v$sesstat and x$kcrfstrand (=private redo) and x$ktifp (=in-memory undo) on your own.

Preemptive redolog switches

We have seen that the redo handling has changed in Oracle 10g, but what has this to do with "preemptive redo log switching"? Well at first we need to know a little detail about how many private and public redo threads are created or used. You will have at least 2 public log buffer (redo) threads, if you have more than one CPU. The maximum amount of private redo threads is defined as transactions / 10. All of these values are based on the currently used algorithms which can change of course by every patch or version. However Oracle will dynamically adjust the number of active public and private redo threads.

So let's assume the following scenario with the new redo handling approach. You got a session <X>, that deletes a couple of data sets (with the fictive SCN 20) but does not finish immediately. So the redo information is currently located in the private redo area. The session idles for some time and a log switch occurs at fictive SCN 30. A few minutes after the log switch session <X> will finish and wants to copy its information from the private area to the public area and write it to the redo log files, but the current log file has a greater starting SCN.

Oracle is aware of this issue (become very nasty in case of recovery - think about the "write-ahead logging" approach) and it assumes the worst-case. So in case of a redo log switch - Oracle calculates the maximum needed space for all active private and public redo threads and started the log file switch at that point when all of this data fits at the end of the current log file. If the redo threads do not contain any content, the free space will be not used and you will see smaller archive redo log files in case of a log switch. Just a simple fictive calculation:

2 active public redo threads with 3 MB each
20 active public redo threads with 64 KB each

So in worst case your archive log files are round about 7.5 MB smaller (6144 KB + 1280 KB = 7424 KB) than the redo log files.

Message "log file switch (private strand flush incomplete)"

The approach of the "preemptive redolog switches" describes how to ensure enough free space in the redo log file, but what has this to do with the message "log file switch (private strand flush incomplete)".

Once again: Oracle is using the "write-ahead logging" approach. This means that all of the corresponding redo data has to be written to the redo log files before the database writer (DBWR) can write its data to the data files (which happens by a log file switch for checkpointing).

What happens if the DBWR wants to write a data block, that is currently associated with a private redo thread? The DBWR is aware of this and copies the associated private redo data to the public redo area and applies it to the data before writing it. The foreground session (that initially allocated the private redo thread for data change) will be suspended, if it tries to generate more redo data while the DBWR is doing its necessary activities and an alert log file entry "log file switch (private strand flush incomplete)" is written. This is the story behind that particular alert log entry. The redo log behavior changes by this situation, but let's disregard these internals right now.

Summary

I have omit so many details, special cases and interesting internals, but i guess this theoretical blog provides a good starting point from the top level. Please let me know, if you are interested into any particular details. I suggest to use Oracle 9i versus Oracle 10g or 11g to see the differences clearly (especially on the statistic level) for your own researching purposes.