[Oracle] Database cache internals and why one data block can use multiple cache buffers
Have you ever heard of the myth “Increase the database cache to the database size and it can be cached completely” or have you ever wondered about how the database cache is organized internally in principle? .. then this blog should be worth to read.
You usually hear a lot of rule of thumbs and known expertise about a topic like database cache size or handling of such caches. Sometimes the cache topic is also used as a “silver-bullet” without a deeper investigation of the real root cause and some customers just follow the recommendations of their consultants without questioning. But you as customer should always question the recommendations and get the idea behind them (maybe there is none).
Cary Millsap, a well known Oracle performance expert already expressed it this way (and i really like the idea of it):
“People think that consultants get paid for having the right answers, but we don’t.
We get paid for convincing people that we have the right answers.
The way you do that, is by showing them the exact process that led to your conclusion.”
Basics of the database buffer cache organization
I must admit, that this section is going to disregard a lot of details about the SGA, its structure, working data sets, different linked lists and queues, but otherwise this blog would be come very large and complex, if i include a lot of these details here as well. In this blog we just want to explore how one (physical) data block can allocate multiple cache buffers. Please let me know, if you are interested into some of the other details as well – so i maybe write another blog post about it.
“A picture tells us more than a thousand words” – let’s start with a tiny illustration of the buffer cache organization.
The database block buffer headers (which point to the real cached data blocks) are hashed and attached to a corresponding hash bucket in a short double linked list. These hash buckets are protected by latches.
Here is a short summary of the most important facts:
- The hashing algorithm for choosing the correct hash bucket for a data block is basically something like this: Hashing of file number and block number
- Oracle creates a lot of these hash buckets (hidden parameter _db_block_hash_buckets or sga variable kcbnhb) to keep the double linked buffer header lists as short as possible (for performance reasons like searching for already cached data blocks or checking the content and so on)
- In newer Oracle database releases one latch covers 32 hash buckets (formula = hidden parameter _db_block_hash_buckets / _db_block_hash_latches)
Let’s illustrate the basic procedure for reading a data block:
- Calculate the correct hash bucket by using the file and block number
- Grab the relevant (cache buffers chain) latch
- Follow the pointers from the hash bucket (jumping from buffer header to buffer header) to find the corresponding buffer
- Do something with the buffer header and data content (if Oracle can already find it in the linked list)
- Drop the relevant (cache buffers chains) latch
That’s it – i already stop here with explaining. This should be enough “basic knowledge” to understand the following demo case and answer the question “why one data block can use multiple cache buffers”.
*** Side note: Maybe you have already noticed a wait event called “latch: cache buffers chains” by troubleshooting performance issues, if several processes try to read a lot of data from the database cache (due to bad SQL execution plans or insufficient database structures). This is typically caused by a contention at step 2 from the procedure above.
Test case – One data block can use multiple cache buffers
The following demo was run with an Oracle database (22.214.171.124.2) on OEL 6.2.
Create the test case
SQL> create table BUFFCACHETEST (num number); SQL> begin for i in 1..10 loop insert into BUFFCACHETEST values(i); end loop; commit; end; / SQL> exec DBMS_STATS.GATHER_TABLE_STATS('SYS', 'BUFFCACHETEST');
The created database object (table BUFFCACHETEST) has the corresponding object id 83552 and allocated 8 contiguous database blocks from block id 93528 upwards.
Cross-check the data
SQL> select NUM, DBMS_ROWID.ROWID_OBJECT(ROWID) as OBJECT_ID, DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) as RELATIVE_FNO, DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) as BLOCK_NUM, DBMS_ROWID.ROWID_ROW_NUMBER(ROWID) AS ROW_NUM from BUFFCACHETEST;
All of the ten rows are stored in the same (physical) data block (93529) in data file 1. Now let’s research what happens with the database cache (and the buffer headers), if we update these rows in several ways.
Remember: All of these rows are all placed in the same physical 8 kb block and so they only need 8 kb of “physical space”.
Initialize test case
*** Restart Oracle Instance to initialize cache *** SQL> startup force *** Fill buffer cache with that one block again SQL> select * from BUFFCACHETEST; *** The following query will be used furthermore for getting the information about the buffer headers SQL> select HLADDR, decode(STATE,0,'free',1,'xcur',2,'scur',3,'cr', 4,'read',5,'mrec',6,'irec',7,'write',8,'pi', 9, 'memory',10,'mwrite',11,'donated', 12,'protected', 13,'securefile', 14,'siop',15,'recckpt', 16, 'flashfree', 17, 'flashcur', 18, 'flashna') as STATE, PRV_HASH, NXT_HASH, BA, DBARFIL, DBABLK from X$BH where OBJ = 83552 and DBABLK = 93529;
One buffer header (for physical block in file 1 block 93529) is attached to the linked list as we just read the data by a full table scan before. We should clarify the selected columns of X$BH, before we go on with the DML statements.
HLADDR = Address of the latch, that protects the corresponding hash bucket
STATE = State of the block like xcur (current version), cr (consistent version, which contains an older version of the block and is available for consistent reads)
PRV_HASH = Address of the previously attached buffer header in the double linked list
NXT_HASH = Address of the following attached buffer header in the double linked list
BA = Address of data block buffer
The other 2 columns should be self-explanatory. Let’s verify the latch by its address to be absolutely consistent in our assumptions.
SQL> select NAME from V$LATCH_CHILDREN where ADDR = '00000000847E4830'; NAME ---------------------------------------------------------------- cache buffers chains SQL> select count(*) from V$LATCH_CHILDREN where NAME = 'cache buffers chains'; COUNT(*) ---------- 1024
As assumed the latch address corresponds to the “cache buffer chain latches” and my test instance got 1024 of these latches. This covers the previous hidden parameter value (check “Initial settings” section) of “_db_block_hash_latches” as well.
Finally let’s do some arithmetic for cross-checking the “latches per hash bucket” assumption: _db_block_hash_buckets / _db_block_hash_latches = 32768 / 1024 = 32
DML test case
In the following section i will update one row after another and query the buffer cache header array afterwards. Please look carefully at the columns “HLADDR”, “STATE” and “BA”.
SQL> update BUFFCACHETEST set NUM=11 where NUM=1;
SQL> update BUFFCACHETEST set NUM=12 where NUM=2;
SQL> update BUFFCACHETEST set NUM=13 where NUM=3;
SQL> update BUFFCACHETEST set NUM=NUM+10 where NUM <= 10;
Only one additional block is needed, even if you update several rows by one DML statement (changes are done to the same block only).
SQL> update BUFFCACHETEST set NUM=21 where NUM = 11;
SQL> update BUFFCACHETEST set NUM=23 where NUM = 12;
We have seen, that every DML statement allocated and used a new block in the buffer cache (different values in column BA, but the same physical block) and attached the corresponding buffer header into the double linked list. This works until the limit of “_db_block_max_cr_dba” is reached. This limit is implemented to keep the linked lists as short as possible. Notice the state of the blocks – you have one current version of the block and several “older versions” of the block that can be used for consistent reads (like long running SQLs).
*** Side note: I already observed, that the limit of parameter “_db_block_max_cr_dba” does not count in any case (maybe due to bugs). I was not able to reproduce a corresponding test case on my Oracle environment with database version 126.96.36.199.2.
Finally we have only one assumption left: “All of these blocks are attached to the same hash bucket and linked together”. Previously we just queried the HLADDR (latch address) and it was always the same, but one latch covers 32 hash buckets – so we can not be absolutely sure about it at this point.
The easiest way (for me) to cross-check this is dumping the information about the buffers.
SQL> oradebug setmypid SQL> oradebug dump buffers 4;
So this particular hash bucket / double linked lists consists of 8 buffer headers. You can reconstruct the “linked chain”, if you follow the blue marked addresses. You will also find the corresponding data buffer addresses as published through X$BH. The current version of the table block is always the first one (of the corresponding database object) as you walk down the linked list (“performance” optimization).
*** Side note: This particular demo case works with DMLs and full table scans only (Switch current to new buffer technique), but it is the easiest way to demonstrate such a behavior in the buffer cache. If you look closely at the BA column, you will see that the newly created “xcur” buffers are copies from the previously current ones.
Statements like “database cache size = database size = fully cached” are not true at all as we have demonstrated, that one physical data block (8 kb) can be stored multiple times in the database cache (in our case 8 kb * 6 = 48 kb).
You maybe wonder now, why do we care about this as we usually have indexes defined (in a SAP environment) and a DML statement does not use a full table scan at all. This is true (for most of the well written application/business logics), but a similar technique is used, if you need a read consistent block for long running SQLs. Oracle cross-checks the SCN of the current block in buffer cache (we assume that is already there now) when reading a data block, but Oracle needs to create a read-consistent copy of the block, if the block SCN is higher than our query SCN.
If Oracle notices such a SCN mismatch, it walks down the linked list furthermore, because it is possible that an appropriate read-consistent copy (column “STATE” = cr) already exists. Oracle reconstructs the block (by cloning the XCUR block and applying the undo) and attaches it, If there is no suitable version available.
If you have any further questions – please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues.
Very interesting blog! This will for sure eliminate a lot of assumptions in this area..
Thanks for sharing!
Thanks for sharing. Article helped me to eliminate the myth related to DB cache,