There is almost no customer system that was not affected at some point in time by a deadlock situation, even if many users and administrators are not aware of this. This blog series will focus on the different types of deadlocks and what can be done in case a particular deadlock occurred.
This article will cover the simple case of deadlocks caused by row level locks, which are usually the result of application problems (wrong parallelization) or programming errors.
A deadlock can occur when two or more users are waiting for a data locked by each other. Oracle automatically detects deadlock situations and resolves them by rolling back one of the statements involved in the deadlock, thereby releasing one set of the conflicting row locks.
This can be demonstrated pretty easy via 2 sqlplus sessions:
SESSION 69:
SQL> update sapnw2.sflight set price='100' where carrid = 'LH';
10 rows updated.
SESSION 67:
SQL> select * from sapnw2.nriv where object = 'RESB' for update;
We can have a look at v$lock in order to have a more detailed overview about the locks:
SQL> select * from v$lock where type = 'TX';
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
000000006A875E50 000000006A875FD8 69 TX 524293 1805 6 0 1017 0
000000006A8986A8 000000006A898830 67 TX 393219 1867 6 0 624 0
The locks of the two sessions are hold with LMOD 6, which means 'exclusive'.
SESSION 69:
SQL> select * from sapnw2.nriv where object = 'RESB' for update;
Of course, this session now appears to 'hang', since session 67 sill holds a lock on the requested rows:
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
000000006BF56B88 000000006BF56BA8 69 TX 393219 1867 0 6 57 0
000000006A875E50 000000006A875FD8 69 TX 524293 1805 6 0 1422 0
000000006A8986A8 000000006A898830 67 TX 393219 1867 6 0 1029 1
We now see a new entry for Session 69, this time with REQUEST = 6, this means that the session now requests an exclusive lock.
SESSION 67:
SQL> update sapnw2.sflight set price = '150' where carrid = 'LH';
This will cause the following error in session 69:
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
Although the example appears to be rather 'artificial' (sflight is not the typical business relevant table), deadlock and wait situations that involve the NRIV table are very frequent in SAP systems. There will be a separate article dealing exclusively with NRIV at a later point in time.
Usually, you are given the task of analyzing this kind of problem post mortem: The issue has already occurred and should not occur again, especially, when for example a business critical job has terminated. Our first entry point for problem analysis is the Oracle alert log:
Sun Nov 25 08:54:16 2007
ORA-00060: Deadlock detected. More info in file /oracle/NW2/saptrace/usertrace/nw2_ora_31341.trc.
The mentioned trace file contains the deadlock graph, which looks like this:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00080005-0000070d 13 69 X 16 67 X
TX-00060003-0000074b 16 67 X 13 69 X
session 69: DID 0001-000D-00000210 session 67: DID 0001-0010-000001B5
session 67: DID 0001-0010-000001B5 session 69: DID 0001-000D-00000210
We can see the following here:
SID 69 holds the lock TX-00080005-0000070d in X mode
SID 69 waits for TX-00060003-0000074b in X mode
SID 67 holds the lock TX-00060003-0000074b in X mode
SID 67 waits for TX-00080005-0000070d in X mode
You will also notice that the waits are always in 'X' mode. This means that we are dealing with row level locks. The other possibilities will be shown in the next blog. Please note that in case of values not equal 'X', totally different analysis methods are necessary.
Rows waited on:
Session 67: obj - rowid = 0000B8EA - AAALjqAAGAAAWN9AAC
(dictionary objn - 47338, file - 6, block - 91005, slot - 2)
Session 69: obj - rowid = 00006367 - AAAGNnAAGAAAqpGAAl
(dictionary objn - 25447, file - 6, block - 174662, slot - 37)
This is very important information. First of all, we can determine which objects are involved:
SQL> select object_name, object_type from dba_objects where object_id = '47338';
SFLIGHT TABLE
SQL> select object_name, object_type from dba_objects where object_id = '25447';
NRIV TABLE
From an application point of view, it is perhaps also help full to investigate the affected rows:
select * from sapnw2.sflight where rowid = 'AAALjqAAGAAAWN9AAC';
select * from sapnw2.nriv where rowid = 'AAAGNnAAGAAAqpGAAl'
Information on the OTHER waiting sessions:
Session 67:
pid=16 serial=459 audsid=-1 user: 0/SYS
O/S info: user: nw2adm, term: pts/1, ospid: 31488, machine: nwtest
program: sqlplus@nwtest (TNS V1-V3)
application name: sqlplus@nwtest (TNS V1-V3), hash value=0
Current SQL Statement:
update sapnw2.sflight set price = '150' where carrid = 'LH'
End of information on OTHER waiting sessions.
Only when knowing both sides of the problem, you can draw the complete picture. When it comes to SAP environments, this means the following: