Technical Articles
ASE Table Auditing with XML
I have been all over the place with auditing, going back and forth over many one-off designs within the same system. I recently had a chance to slow down and think about an implementation for our ASE installation, and came up with something very useful. The design handles changes needed to implement an auditing process where table Inserts, Updates, and Deletes are recorded:
- For Inserts, a record of “who” and “when” is maintained, for the inserted row.
- For Updates, a record of “who” and “when” as well as the contents of the previous row is maintained.
- For Deletes, a record “who” and “when” as well as the contents of the deleted row is maintained.
These operations may come from applications or from an isql session, or from an IDE such as DBArtisan, or from a stored procedure, or from a trigger, or from an RPC, or from anywhere else. It does not matter where the operation originates, it must be audited.
In order to audit inserts, updates, or deletes no matter where they originate, auditing uses the database transaction log as the means of catching the operation, which means that either triggers, or the Repserver can be used as the capturing mechanism. In our case, where the Repserver is not implemented, triggers are used.
Since the database transaction log is used as the capturing mechanism, database rollback operations will automatically “un-audit” transactions when they fail.
Some database operations are non-logged. Select-into operations and bcp operations on un-indexed tables, and truncate table operations are non-logged. In the first 2 cases, the only type of operation is an Insert. For this non-logged Insert, the requirement still exists to record the “who” and “when” for the inserted row.
In order to truncate a table, you must be the table owner or a user with truncate table permission. For our system, this permission is not granted to users. The table owner is dbo which is a restricted user, used only during deployment. Therefore, a truncate table by dbo is the only scenario in which the auditing is bypassed.
The glId, glUser, and glTime Attributes
In order to audit, each table is given a surrogate primary key. This GUID is a 16 byte hexadecimal data type that is globally unique across tables, databases, and servers. The term GUID stands for Globally Unique Identifier. a GUID is generated with the newid([flag]) SQL function. The flag argument specifies the result format. Option 1, which we used, specifies a result format of varchar(36).
Using a GUID to uniquely identify a row means that anything in the row can change, including the business primary key, and we can still locate the row. This is important for auditing because we can track changes to the row over time no matter what changed in the row. It is also important because we have a consistent way to locate a row across all tables, namely the GUID, rather than the business primary key which is different for each table and can also be a very complicated composite primary key.
In addition, we also add 2 more attributes to each table, a glUser and a glTime attribute. When a row is inserted, the value of these attributes are the user who inserted the row and the time of insert. Here is the schema for a table:
1> create table t_Country (
2> glId varchar(36) default newid(1) not null,
3> glUser varchar(30) default suser_name() not null,
4> glTime datetime default getutcdate() not null,
5> countryId int not null,
6> code varchar(2) not null,
7> description varchar(30) not null,
8> currencyId int not null)
9> go
The prefix gl stands for “global”. If the glId, glUser, and glTime values are not supplied, they are defaulted. The value for glId should be defaulted per the table DDL rather than coming up with a value yourself. You could get it wrong, since it is just a varchar(36), so it is best to let the default kick in.
You can let glUser and glTime default, or you could pass in your own values if you wish. Database defaults can only be SQL function results or constants, so the best we can do is suser_name() and getutcdate(). If you want something different, then pass them in.
Along with each table, there is a view. The view is the object through which selects, inserts, updates, and deletes are done. Here is the view corresponding to the table:
1> create view Country as select
2> select
3> glUser,
4> glTime,
5> countryId,
6> code,
7> description,
8> currencyId
9> from t_Country
10> go
Notice that glId is not included. Why? There is a good reason for this. If you did this:
1> insert Country
2> select * from Country
3> go
then you would get duplicate glId’s which is the wrong thing to do. If, however, glId is not included in the view, then the default will kick in and you will get a new glId for each new row.
In addition, glId should be generally thought of as an invisible column which is not useful to anything business-wise. Therefore, it is not important to see it in the view.
There is nothing to keep you from inserting, updating, or deleting directly to table t_Country rather than going through the Country view. Our convention is to use the view, but it is not enforced because some perfectly legal things can only be done on the table, like sp_spaceused() or truncate table for the dbo user. Therefore, to absolutely prevent duplicate glId’s, we also generate the following index on the table:
1> create unique index glId_idx on t_Country (glId)
2> go
The Universal t_Audit Table
Auditing is handled differently for inserts versus updates and deletes. The goal is to avoid over-engineering auditing for inserts.
When a row is inserted, we really don’t need to generate a record of the action in any type of audit record, because the row in the table IS the inserted row and it has the user and date that the row was inserted. This is sufficient for auditing.
Once the row has been inserted, it now becomes important to keep a record of changes resulting from an update, and it also becomes important to keep a record of the row before being deleted. This is the purpose of the audit trail.
There are two ways to implement the audit trail. First, we can have one audit table per base table, with a matching schema and an additional attribute for the audit table to specify whether the row is an U-pdated row or a D-eleted row. Historically, this is how we’ve done it. If you have 200 tables in the system, you really end up with 400 tables for complete coverage. Every time you alter the base table, you have to do the same alter to the audit table. This does not work out well.
The other way uses a novel approach. We create a single universal audit table which contains the audit history for all tables. Since each originating table has its own schema, the audit table must handle this. The best (and SQL-supported) way to do this is to store the attributes of the originating table as a JSON or XML string. In this way, any originating schema can be represented. The universal audit table therefore looks like this:
1> create table t_Audit (
2> relName varchar(50) not null,
3> glId varchar(36) not null,
4> glUser varchar(30) not null,
5> glTime datetime not null,
6> glAction char(1) not null, -- U or D
7> glXml text not null in row(7957))
8> go
9> create unique clustered index idx1 on t_Audit(relName,glId) with allow_dup_row
10> create index idx2 on t_Audit(glUser)
11> create index idx3 on t_Audit(glTime)
12> create index idx4 on t_Audit(glAction)
13> go
The immutable attributes which are common to all originating tables are stored as separate attributes in t_Audit. These attributes also serve as search arguments. The attributes of the originating tables which vary from table to table are stored in the glXml string.
Inserting into the Audit Trail
An audit trail is generated for update and delete operations. This is where triggers come in handy. For each table, there is an update and delete trigger. These triggers receive the row(s) affected, and produce insert(s) into t_Audit. Here is the trigger code:
1> create trigger t_CountryTrigger
2> on t_Country
3> for update, delete
4> as
5> begin
6> --
7> -- Deletes
8> --
9> insert Audit (relName,glId,glUser,glTime,glAction,glXml)
10> select
11> relName = 't_Country',
12> glId = x.glId,
13> glUser = suser_name(),
14> glTime = getutcdate(),
15> glAction = 'D',
16> glXml =
17> '<row>' +
18> '<a1>' + convert(varchar(255),x.countryId) + '</a1>' +
19> '<a2>' + dbo.fc_printable(x.code) + '</a2>' +
20> '<a3>' + dbo.fc_printable(x.description) + '</a3>' +
21> '<a4>' + convert(varchar(255),x.currencyId) + '</a4>' +
22> '</row>'
23> from deleted x
24> where x.glId not in (select glId from inserted)
25> --
26> -- Updates
27> --
28> insert Audit (relName,glId,glUser,glTime,glAction,glXml)
29> select
30> relName = 't_Country',
31> glId = x.glId,
32> glUser = suser_name(),
33> glTime = getutcdate(),
34> glAction = 'U',
35> glXml =
36> '<row>' +
37> '<a1>' + convert(varchar(255),x.countryId) + '</a1>' +
38> '<a2>' + dbo.fc_printable(x.code) + '</a2>' +
39> '<a3>' + dbo.fc_printable(x.description) + '</a3>' +
40> '<a4>' + convert(varchar(255),x.currencyId) + '</a4>' +
41> '</row>'
42> from deleted x, inserted i
43> where x.glId = i.glId
44> end
45> go
There are a couple things to note:
1. The XML string must be a text (character) string conforming to the XPath standard. Therefore, the text must not contain >, <, \, /, “, and ‘. Additionally, we do not want the text to contain “,” or “|” because these 2 characters would disrupt the format of csv and pipe files. There is nothing to prevent reporting of t_Audit into csv and pipe files. In addition, all non-ascii characters must be removed.
So, how to strip all of this out of char and varchar values from the originating tables? We use the home-made fc_printable() user-defined function. The use of the function is shown in the trigger, and can also be used elsewhere.
2. Why are the row attributes in glXml identified by a1,a2,a3,a4? Two reasons. First, if we used the attribute name, then this name could never be changed in the originating table without losing the audit history in t_Audit. If you changed code to countryCode, then you would also need to change the trigger to:
'<countryCode>' + dbo.fc_printable(x.countryCode) + '</countryCode>'
Mechanically this is not a problem, but, how do you find old audit entries which were posted under the code name? The answer is, you don’t. The second reason for using a1,a2,a3,a4 is to save glXml character string space.
3. The values for glUser and glTime are as follows:
glUser = suser_name(),
glTime = getutcdate(),
For glTime, getutcdate() is correct. This is the time of the delete or update. We don’t use glTime from the originating table. That was the time of the insert. For glUser, the database user name is probably not right in all cases, but for now, this is what we use.
4. Since we are using a glId surrogate primary key to identify a row, the glId search argument is the same for all triggers. In addition, as updates accumulate for a row, they accumulate in t_Audit under the same glId. When the row is deleted, the delete record also accumulates into t_Audit under the same glId. And, when the row is first inserted, the glId in the originating table identifies the inserted row. This glues everything together – the record in the originating table and the accumulated history of changes in t_Audit.
5. An update record contains the values of the originating row PRIOR to the update. This is important. It shows the old values. The new values after the update are in the originating table. For a delete record, the record likewise contains the value of the originating row PRIOR to the delete. There is no row in the originating table.
6. Triggers which access off-row datatypes are not allowed. This is not related to auditing, but does show up here. For example if you try to create a trigger on t_Messages which contains a text datatype, the following SQL error is given:
Msg 2116, Level 16, State 1: Server 'AAA', Procedure 't_MessagesTrigger, Line 40: CREATE TRIGGER failed because selecting from a TEXT, IMAGE, UNITEXT or off-row Java datatype column of the inserted or deleted table is not supported.
A reasonable workaround would be to audit all attributes except the non-supported one’s.
Selecting from the Audit Trail
We need the ability to select from t_Audit for a specific table. This is usually how audit data is accessed. In order to do this, we store the table name with the audit record. Beyond that, we create an audit view for each table as follows:
1> create view CountryAudit as 2> select 3> relName = relName, 4> glUser = glUser, 5> glTime = glTime, 6> glAction = glAction, 7> countryId = xmlextract('//row/a1/text()', glXml returns int), 8> code = xmlextract('//row/a2/text()', glXml returns varchar(2)), 9> description = xmlextract('//row/a3/text()', glXml returns varchar(20)), 10> currencyId = xmlextract('//row/a4/text()', glXml returns int), 11> glId = glId 12> from t_Audit 13> where relName = 't_Country' 14> go
There are a couple things to note:
1. The SQL xmlextract() function is used to extract the value for the attribute. If finds the value in the glXml string using the XPath notation. The value is returned in the same datatype as the originating table.
2. The CountryAudit view returns the data in a normal row and column format. There are normal cases where a column is added to the originating table via the alter table command. When the is done, the update and delete trigger code for the originating table is also changed. The CountryAudit view is changed also. The rows in t_Audit which have already accumulated, are unaffected by the change. When data is extracted via xmlextract(), if the pattern is not found for a row then a NULL is returned in the result set. In this way, additions to the table schema to not break auditing.
3. The xmlextract() function does some transformation of the text. These characters are transformed:
& is transformed to “&”
< is transformed to “<”
> is transformed to “>”
“ is transformed to “"e;”
‘ is transformed to “'”
This normally would not be a problem other than the fact that you have weird stuff in your char and varchar strings. However, this can also blowout the size of a char or varchar. For example, if location is varchar(30) and contains this string:
Richmond 97 Boundary King’s Road Center
then xmlextract() will try to return this:
Richmond 97 Boundary King's Road Center
and selection will generate an SQL error.
Therefore, we strip all predefined XPath entities from char and varchar attributes prior to storing them in t_Audit. The fc_printable() function in the update and delete trigger does this.
4. So why do we store data in an XML format? Why not JSON? Or something homemade? The reason is that ASE has native support for XML and supplies the xmlextract(), xmltest(), xmlparse(), and xmlrepresentation() built-in SQL functions to handle XML.
An Auditing Example
In this example, we insert a country, change its currency, change its country ID, and then delete it. The Audit trail shows the history.
Insert a Country:
1> insert Country(countryId,code,description,currencyId) values (1,"US","United States",22)
2> go
(1 row affected)
1> sp_autoformat Country -- This is the view
2> go
glUser glTime countryId code description currencyId
------ ------------------- --------- ---- ------------- ----------
ui1 Dec 31 2019 8:22PM 1 US United States 22
(1 row affected)
1> sp_autoformat t_Country -- This is the base table. We would not normally select from this.
2> go
glId glUser glTime countryId code description currencyId
------------------------------------ ------ ------------------- --------- ---- ------------- ----------
00b11046-1495-4708-99fb-820586ec5647 ui1 Dec 31 2019 8:22PM 1 US United States 22
Update the Country:
1> update Country set currencyId=10 where countryId=1
2> go
(1 row affected)
1> update Country set countryId=5 where countryId=1
2> go
(1 row affected)
Delete the Country:
1> delete Country where countryId=5
2> go
(1 row affected)
Select the Audit Trail
1> sp_autoformat CountryAudit
2> go
relName glUser glTime glAction countryId code description currencyId glId
--------- ------ ------------------- -------- --------- ---- ------------- ---------- ------------------------------------
t_Country ui1 Dec 31 2019 8:25PM U 1 US United States 22 00b11046-1495-4708-99fb-820586ec5647
t_Country ui1 Dec 31 2019 8:25PM U 1 US United States 10 00b11046-1495-4708-99fb-820586ec5647
t_Country ui1 Dec 31 2019 8:26PM D 5 US United States 10 00b11046-1495-4708-99fb-820586ec5647
(3 rows affected)
And so there you have it, a nice uniform implementation of table auditing.
Great article.
My only suggestion will be to use bigdatetime in place of datetime.
Storage requirements are same but you get microsecond accuracy.
This may be useful if transactions happen rapidly and you want chronologically ordered sequence.
Avinash
Ah yes, thanks Avinash. I had almost forgotten about that data type. I have not used it much.