DB2: Primary and Secondary logs and Troubleshooting
I thought of writing this blog for a long time but then did not knew lot of things on this. And as they say half knowledge is dangerous so after lot of research and study I could sum up as below.
My intention to write this blog is to concentrate more on log full situation that occurs in SAP running on DB2 database system (mostly Production) and troubleshooting it.
So, lets begin…
Something about Log files in DB2:
Similar to archive log in Oracle we have logging written in DB2 as primary and secondary log files.
- Primary log file:
These files are allocated once the Database is active. Or you can say the the file space gets allocated as a part of database start/active process.
The number of files of given size are allocated at once.
LOGPRIMARY: Is the parameter that tells the number of primary log files that are created when database is started.
- Secondary log file:
These files are similar to primary log files but database does note allocate them at once, instead on as and when required basis.
For me, the concept of secondary log is to provide a way out when you have log full situation in sap system, as you can increase/decrease the count on the fly (like RZ11 parameter).
LOGSECOND: is the parameter that tells the total number of secondary log files database can allocate when in need.
Other important Parameters:
- LOGFILSIZ: Log file size of both primary and secondary files * 4 i.e if the parameter is set to 100 KB than the primary and secondary log file size would be of 100*4 = 400 KB
- LOGPATH: The path where the logs will reside at file system. The ideal location usually is /db2/SID/log_dir/NODE0000/ location.
You can find these parameters values by executing below command by db2<sid> user:
$ db2 get db cfg | grep -i log
View these parameter from SAP level via ST04 –> Configuration (from the right hand pane) –> Database
Something on Troubleshooting:
You can monitor the current log usage in SAP from ST04 –> Performance –> Logging tab –> In Secondary log header, field name Logs Currently Allocated
The number will tell you how much secondary log at present is allocated.
If you find any number in that case note down the Application with Oldest Transaction value in the same screen of tcode (ST04 as above). This field will give us the database agent id that is consuming the secondary log.
Now How to find the work process that is consuming these logs?
Copy the value you found for Application with Oldest Transaction (ST04), then from the right hand pane double click on Applications and search for the value in the column Appl. Handle. Make sure you keep the format of the value as it is as it may be comma separated, double click on it and detailed information will be displayed
Where in the information
With Application header:
PID: is of SAP work process
Status: current status of work process
With Client Information header:
User ID: Work process running under user ID
Workstation: SAP application/instance name
Application: Type of work process
You can also find the values from OS/DB2 level by executing the below sets of command with db2<sid> user as mentioned below
- To get the total secondary logs allocated
$ db2 get db cfg for <DBSID> | grep -i Secondary
- To get the current secondary log allocation
$ db2 get snapshot for database on <DBSID> | grep -i ‘allocated currently’
- To get the details on oldest transaction consuming secondary log
$ db2 get snapshot for database on <DBSID> | grep -i oldest
- To get the process ID of SAP consuming secondary log
$ db2 get snapshot for application agentid xxxx | grep -i ‘Process ID’ ; xxxx is the value that you obtained from the above command
Now you can terminate the process to bring back the sec log count to 0 (in case the sec log count should reach the allocated log file count). Usually we do this when it reaches 80% of the sec log file value LOGSECOND.
As killing the process will try to rollback the activity performed and this will again consume secondary logs. This action is to avoid your system from being stand still.
Other way out:
You can set NUM_LOG_SPAN parameter to auto terminate the process consuming the sec log.
You can set the value like 80% of (LOGPRIMARY + LOGSECOND).
E.g.: LOGPRIMARY: 50 LOGSECOND: 50 than NUM_LOG_SPAN if set to 80% would have value 80
But this parameter will terminate the agent only if it is the only agent consuming the threshold value. (Number of active log files for 1 active UOW)
If there are many agents involved in sec log consumption than you need to find all the agents and terminate them one by one to avoid log full/system hung situation, to do this use below query
- To find all the processes using secondary log space
$ db2 “SELECT A.SNAPSHOT_TIMESTAMP TIMESTAMP, SUBSTR(CHAR(A.AGENT_ID),1,5) APPL_HDL, SUBSTR(CHAR(B.COORD_AGENT_PID),1,8) AGENT_PID, SUBSTR(CHAR(B.CLIENT_PID),1,8) APPL_PID, SUBSTR(CHAR(A.UOW_LOG_SPACE_USED),1,13) LOGSPACE_Bytes, SUBSTR(CHAR(A.LOCKS_HELD),1,8) LOCKS, A.UOW_COMP_STATUS, B.APPL_STATUS, INT(A.APPL_IDLE_TIME/60) IDEL_TIME_MIN, A.DBPARTITIONNUM, SUBSTR((RTRIM(LTRIM(C.STMT_TEXT))),1,200) STMT_TEXT FROM SYSIBMADM.SNAPAPPL A, SYSIBMADM.SNAPAPPL_INFO B,SYSIBMADM.SNAPSTMT C WHERE A.AGENT_ID=B.AGENT_ID and A.AGENT_ID=C.AGENT_ID AND A.DBPARTITIONNUM=B.DBPARTITIONNUM AND B.DBPARTITIONNUM=C.DBPARTITIONNUM AND A.UOW_LOG_SPACE_USED > 0 ORDER by A.UOW_LOG_SPACE_USED DESC”
Say you decide to increase/decrease the secondary log instead of every time terminating the process consuming the log than this can be done without restarting DB by..
- To change the secondary log
$ db2 connect to <DBSID>
$ db2 update db cfg using logsecond XX immediate ; XX is the number you wish to keep for sec log.
The number of primary and secondary log files must comply with the following:
- If logsecond has a value of -1, logprimary <= 256.
- If logsecond does not have a value of -1, (logprimary + logsecond) <= 256. i.e. the total count of primary and secondary log files cannot exceed beyond 256. In such case you can increase the LOGFILSIZ value, provided you can enough space in log_dir directory.
Few important sap notes:
2141933 – DB6: How to initially trouble shoot transaction log full issue
495297 – DB6: Monitoring transaction log
Hope this information is good enough for you to work on log full issues. Please comment/like the post and YES, do let me know if something is missing or needs to be corrected or added. Thank you.
That’s all folks!!!!!!
There is very good SAP Note and can be include in the blog,
2141933 - DB6: How to initially trouble shoot transaction log full issue
495297 - DB6: Monitoring transaction log
Added, thank you.
A couple of points for people who come across this;
the database configuration parameter NUM_LOG_SPAN is available from DB2 9.7 onwards.
Another useful tool is to use the DB2 Governor utility, set up a rule that forces off any connections that have been idle for, say, 4 hours. This is available from DB2 9.7 onwards. Full instructions on how to configure and run it are available at http://www.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.admin.perf.doc/doc/c0004985.html
However, the basic principle is that you define a configuration file, that tells the governor how often to perform it’s checks, and what checks to perform, then you start the governor (either manually or via cron or as an addition to the startsap scripts).
I’ve used both of these to manage log file saturation – this occurs when there is an active transaction (one that has not issued a
rollback), keeping an old transaction log active. DB2 can not archive active transaction logs (because they are required for crash recovery), so an old, uncommitted transaction can lead to a transaction log full error (
SQL0964C) as other transactions use transaction log space.
However, if you’re suffering from problems with Log Saturation you need to understand why the long-running transaction had not performed a COMMIT or ROLLBACK – reasons could range from ‘systemic’ issues like an RFC process waiting for network response from another SAP instance or system, to a badly written ABAP running in background (that updates a file at the start of processing and does no commits until the program ends 2 or 3 hours later).
Of course, you can increase the number of logs specified by LOGPRIMARY and LOGSECOND. However, this implies you either know the long running transaction and accept that it is running for a long time (and it’s impact on the rest of the system !!), OR have accepted that increasing the number of LOGs is a more cost-effective fix than performing an RCA of the issue (which is quite sensible if the error only occurs once or twice a year !!)