Today I wanted to share some information about the audit process and specifically what you need to know if you have “suspend audit when device full” set to 1.
First of all let’s cover some high level points of auditing.
The sybsecurity database contains the sysaudits_XX tables and these rotate whenever “current audit table”,0,”with truncate” is set to the next audit table. When this happens, the new table is truncated and inserts from audit events begin writing to this new table.
A threshold procedure can be utilized to implement table rotation via “current audit table” into a self-maintaining audit trail archive. As long as the threshold fires successfully and your audit trail archive database or sybsecurity database doesn’t have any space or insert problems this should all happen seamlessly.
Next some words about the internal memory structures of auditing.
All auditable events that have been setup will be queued to a “Auditing Queue Object” These objects from the configurable “audit queue size” and contain the event data to be inserted to sybsecurity. The audit event will request an audit queue object and upon success, it will be sent to the audit process for insert to sybsecurity. If you run out of audit queue objects the spid will sleep until one becomes available and the audit process continues.
If you have a very large amount of auditing options set and high concurrent activity or bursts of activity, the default of 100 events might not be enough to prevent spids from sleeping while waiting for grab an object from the memory pool. Increasing “audit queue size” may be necessary to handle the load and/or bursts of high concurrent activity.
Putting it all together, here’s an example of the flow for an audit of a login.
- Incoming login request
- Check for auditing is enabled
- Grab an audit queue object
- Queue event to the audit process
- At this point, the login is successful and proceeds normal processing
- Audit process inserts the event to sybsecurity
- Audit queue object is freed after successful insert
Steps 1-4 are done by the spid making the login request.
Steps 5-6 are done by the audit process.
Note the branch after step 4 when the audit event object is successfully grabbed and queued to the audit process. The login will continue as normal while the audit process picks up the queued event to insert into sybsecurity. If the audit queue is full at this point and an audit queue object cannot be grabbed, the spid in sp_who show sleeping in MAINTENANCE TOKEN until this queue is drained enough until the audit event object can be grabbed and queued.
Now, let’s talk about the option “suspend audit when device full”
The documentation states:
- 0 – truncates the next audit table and starts using it as the current audit table whenever the current audit table becomes full. If you set the parameter to 0, the audit process is never suspended; however, older audit records are lost if they have not been archived.
- 1 (the default value) – suspends the audit process and all user processes that cause an auditable event. To resume normal operation, the system security officer must log in and set up an empty table as the current audit table. During this period, the system security officer is exempt from normal auditing. If the system security officer’s actions would generate audit records under normal operation, SAP ASE sends an error message and information about the event to the error log.
Essentially this means if “suspend audit when device full” is 0 and an insert of an audit record to sybsecurity fails, the ASE will automatically set the current audit table to be the next table, truncate it and start using it for inserts. This prevents the audit process from being suspended at the cost of possibly losing an audit trail if archives had not been triggered from a threshold or from some other archival method.
The sybsecurity database is most likely to fill due to a flaw or error in the threshold procedure. This can be anything from missing objects in the audit archive database or a change in tables that the threshold procedure is dependent on.
If “suspend audit when device full” is 1, some events will happen that are specific to having this parameter set when sybsecurity fills up. First and foremost the audit process will suspend and be internally marked via status bit as being suspended until manual intervention clears it.
The errorlog will have this message:
AUDIT PROCESS EXCEPTION: Can’t allocate space for either syslogs or the current sysaudit table in database ‘sybsecurity’ because the corresponding segment is full/has no free extents. Please refer to Security Administration Guide for details.
Now that the audit process is suspended, the following can be seen
1. All pending auditable events will remain pending
2. New auditable requests will rapidly consume objects in “audit queue size”
3. Once the auditable requests exhausts the “audit queue size”, the auditable requests will sleep waiting for objects to be freed from the pool
4. The audit process will not proceed until a “current audit table”,0,”with truncate” has rotated and truncated a new table for inserts while auditing is enabled
Very important piece of #4 above, is this will NOT take effect if “auditing” is set to 0. Manually clearing or truncating space in sybsecurity is not sufficient to inform audit process it is time to wake up. The only mechanism for the audit process wake up from suspension after it has been suspended and “suspend audit when device full” is 1 is to issue sp_configure “current audit table”,0,”with truncate” to rotate to a new table while auditing is enabled. This will clear the status bit of being suspended which then wakes up the process to begin the cycle of grabbing events on the audit queue.
The reason for this requirement is that under a normal cycle of space allocation and deallocation, “current audit table”,0,”with truncate” is THE method for space management in sybsecurity. By having “suspend audit when device full” enabled, we’ve set the expectation to not lose any audit events. Therefore the only natural course to wake up the suspended audit is to use the same method for archival steps and space management.
If you get caught in this audit space and queue bottleneck, here are some emergency steps to take.
1. sp_configure “suspend audit on device full”, 0
2. Cure all space issues (and archive as necessary) in sybsecurity, including consider increasing size.
3. Drastically (and temporarily) increase the “audit queue size” to absorb the load
4. Re-enable auditing via sp_configure “auditing”,1 if it was disabled. If this is not responsive try sp_audit “restart”
5. sp_configure “current audit table”, 0, “with truncate”
6. The audit queue should begin to drain
Once things go back to normal, assess if the risk of missing an audit trail vs. sudden and unexpected problem with sybsecurity filling/threshold procedure not working with respect to “suspend audit when device full”
I hope this helps you in your future auditing endeavors to prevent this hanging situation and how to get out of it.