On queue processing, what (just) happened? Part #3
So it seems that finding out what happened to an entry/assignment and why is also an issue. Here are some hints on how.
Please note that these are not official table definitions. Views that start with idmv and mxpv are public and stable, but the tables themselves can and have occasionally changed.
2014-05-27: Listing task execution in time intervals added
All provisioning tasks are audited, the combination mskey, actionid, auditid is unique in the system. Any task that is started using the IdM UI, Rest interface, attribute events, privilege events etc. will get a unique AuditId. If the workflow processes triggers addd
The main audit table is mxp_audit. This contains a single row entry for each task/event that has been triggered for your entries. Some notable columns:
|taskid||Taskid of root task of this audit (such as change identity, provision, deprovision, modify in provision framework)|
|Entry for which this audit was started|
|auditroot||root audit or itself. Root audit points back beyond the parent audit (refaudit) to the very first audit in case of deep nested events|
|posteddate||Datetime when the task was initiated and put into the provisioning queue|
|statusdate||Datetime when the task or any subtask of this audit was last updated or evaluated|
|provstatus||See mxp_provstatus, Key values: 0=Initiated OK, 1000=Task OK, 1001=Task Failed, 1100= OK, 1101=FAILED|
|LastAction||Id of last subtask that has been executed or attempted executed. Will be the task that failed in case of provstatus 1001 or 1101|
|refaudit||null or the parent audit|
|MSG||Error message or just informational message depending on provStatus|
Somewhat fuzzy logic datablob, but there is a structure to it
If its simply a number then it should be the mskey of an entry, and the task was started by a user in the IdM UI.
When value starts with # this column indicates an privilege attribute event that caused the task to start. The Format is #<attrid>:<operation>:<checksum>:<oldvalueid>
395 is the attribute ID (MXREF_MX_PRIVILEGE), 3285570 is the checksum of the value that triggered the task (mxi_values.bCheckSum=<checksum>), oldvalueid = 0 means this is a new value, not a replacement/modify otherwise the old value can be found in mxi_old_values.old_id=<oldvalueid>
When starting with + this value indicates that this is an On event such as On (chain) OK or On (chain) Fail. Format is +<taskid>:<eventname>
Example: +1002083:On Chain OK task
When starting with * this value indicate that it was started by an entry event (defined on the entry type). Format *<entryid>:<operation> where operation should be Insert, Modify or Delete.
|startedBy||Only valid when task is started from Workflow UI/REST, contains the mskey of the logged in user that ran the task|
A fun/useful thing to do with the audit includes checking the average execution time of a task from start to end over time. Change taskid=X to taskid in (x,y,z) to get more tasks or extend this sequel with a link to mxp_tasks to use tasknames and be creative. I suggest keeping it limited to a few tasks or the results might become difficult to interpret.
select taskid,convert(varchar(10),posteddate,20) Date,count(auditid) as numExecs,avg(datediff(ss,A.posteddate,A.statusdate)) AvgTimeToComplete from mxp_audit A with(nolock) where taskid = 1 and posteddate > '2014-02-01' and ProvStatus >999 group by taskid,convert(varchar(10),posteddate,20) order by taskid,convert(varchar(10),posteddate,20)
select taskid,to_char(posteddate,'YYYY-MM-DD') "date",count(auditid) "numExecs",round(avg(statusdate-posteddate)*24*60*60,2) "avgTimeToComplete" from mxp_audit where taskid = 20 and postedDate > to_date('2014-02-01','YYYY-MM-DD') and provstatus > 999 group by taskid,to_char(posteddate,'YYYY-MM-DD') order by taskid,to_char(posteddate,'YYYY-MM-DD')
This calculates the average time between start-time and end-time of the task with id=1 (SQL) and 20 (Oracle), and I suggest using the taskid for Provision or Modify to test this. ProvStatus >= 1000 are completed, those still running will have no statusdate worth using in this case.
On SQL Server changing the length of the convert statement to 7 characters you can group it by month, 4 to get per year. On Oracle you can change to to_char conversions to YYYY-MM, or just YYYY.
You can also query for posteddate between two dates and much more. This is also useful to spot negative trends over time, but you must consider the overall load of the system. This is also useful during testing and tuning to verify if improvements you do have any impact on a full workflow execution.
List all tasks that have been executed on a user (SQL Server and Oracle):
select A.auditid, A.AuditRoot, A.RefAudit auditParent, A.userid, A.StartedBy, A.taskid, T.taskname, A.mskey, A.PostedDate, A.StatusDate, A.provstatus, A.LastAction, A.msg from MXP_AUDIT A, MXP_Tasks T where A.TaskId = T.TaskID and A.msKey = (select mcmskey from idmv_entry_simple where mcMskeyValue = 'ADMINISTRATOR') order by auditroot,RefAudit
mxp_audit.taskid can be linked to mxp_tasks.taskid to get the taskname when accessing the mxp_table instead of the view (which has an unfortunate top 1000 limit).
The extended audit is stored in mxp_ext_audit. This contains a single row entry for each task/action executed within an audit and is enabled by checking the “Enable trace” checkbox.
|Aud_datetime||Datetime when the ext_audit record was created|
|Aud_Approver||Mskey of approver. You should use mxp_link_audit for this when getting link approval records|
Generic information. If the audited task is a switch or conditional this column will contain the result of the evaluation. In case of conditionals it will be TRUE or FALSE, for switches it will contain the value returned by the SQL Statement.
Reason for the task starting. Another fuzzy logic data blob. Some of the common value formats:
TASK:<taskid>:<task operation. 0=inittask, 1=OnError, 2=OnOk, 3=OnChainError, 4=OnChainOk>
OTHER:<other info> : This is typical for tasks started using uProvision.
Operation values: 1=Modify, 2=Delete, 3=Insert
Entryoperation values: 0=Provision, 1=Deprovision, 2=Modify
The extended audit is useful for when you need to see what happened in subtasks, what conditional or switch statements returned or find out where a workflow stopped for a user. This query lists all tasks started by and including auditid 1307812, but can easily be modified to filter on aud_onEntry (mskey), and dates.
SQL Server and Oracle:
select t.taskname,A.aud_ref,aud_Datetime,aud_info,Aud_StartedBy from mxp_ext_audit A, mxp_tasks T where T.taskid = A.aud_task and aud_ref in (select auditid from mxp_audit where AuditRoot=1307812) order by aud_datetime
The red arrows shows a child audit being started, in this case by a uProvision call in a script, and the green arrows show where the child-audit is completed and allowing the parent audit to continue from its wait for events state.
Listing task execution in time intervals
(SQL Server only for the moment)
It is also possible to use the audit and extended audit to list task executions per X minutes between date A and date B:
-- Create helper table for intervals and declare vars declare @intervaltbl table(st datetime,et datetime) declare @it datetime declare @interval int set @interval = 5 -- interval length in minutes set @it = '2014-05-27 10:00:00' -- Interval Start time while @it <= '2014-05-27 12:30:00' -- Interval End time begin insert into @intervaltbl values (@it,dateadd(mi, @interval, @it)) set @it = dateadd(mi, @interval, @it) -- Increase to the next @interval minute interval end -- Do a count per interval in mxp_audit select it.st startinterval, it.et endinterval, count(*) troottaskstarted from @intervaltbl it, MXP_AUDIT A with(nolock) where A.PostedDate between it.st and it.et group by it.st,it.et
It’s also possible to list the number of task/actions the same way. This query can be modified to group on t.actiontype to list the number of ordered/conditional/switch/actions were performed per interval:
-- Create helper table for intervals and declare vars declare @intervaltbl table(st datetime,et datetime) declare @it datetime declare @interval int set @interval = 15 -- Interval length minutes set @it = '2014-02-04 10:00:00' -- Interval Start time while @it <= '2014-02-04 16:00:00' -- Interval End time begin insert into @intervaltbl values (@it,dateadd(mi, @interval, @it)) set @it = dateadd(mi, @interval, @it) -- Increase to the next @interval minute interval end --Do a count of tasks and actions started per interval in mxp_ext_audit select it.st startinterval, it.et endinterval, T.taskname, count(*) TimesStarted from @intervaltbl it, MXP_EXT_AUDIT A with(nolock), MXP_Tasks T with(nolock) where A.Aud_datetime between it.st and it.et and A.Aud_task = T.taskid group by it.st,it.et, T.taskname order by it.st, T.taskname
Link audits, mxi_link_audit
Audits related to evaluation and processing of reference values (role/privilege assignments, manager and other references) have information stored in mxi_link_audit (also see mxi_link_audit_operations). This has a lot of columns and I suggest you look at the views and see what is there. Some of the key columns are:
|mcLinkid/linkId||Reference to mxi_link.mcUniqueId|
|mcauditid/auditid||Reference to mxp_audit.auditid|
|mcDate/date||Date of entry|
|mcOperation/operation||Reference to mxi_link_audit_operations|
|mcMSKEYUser||Mskey of user|
|mcMSKEYAssignment||Mskey of the assigned entry (privilege, role, manager etc.)|
Example data from idmv_linkaudit_ext2 view for an audit in which a role was added to a person, which caused to two inherited privileges to be assigned. Later the role was removed.
SQL Server and Oracle:
select linkid,auditid,auditDate,userMSKEYVALUE,AssignmentMSKEYVALUE,OperationText,AdditionalInfo from idmv_linkaudit_ext2 where userMskey = 23 order by auditdate
Note that a new audit be created only when there’s an event task execution. The privilege in my example only had a del-member event and this event got a new audit (944072), the rest shared the add-audit of the role they were inherited from.
Useful variation with tasknames (SQL Server and Oracle):
select LA.userMSKEYVALUE, LA.auditDate, LA.AssignmentMSKEYVALUE, LA.operationtext, LA.auditid, A.taskid, T.taskname from idmv_linkaudit_ext LA left outer join mxp_audit A on A.AuditID = LA.auditid left outer join mxp_tasks T on T.taskid = A.TaskId where LA.userMSKEYVALUE = 'USER.TEST.3' order by LA.auditDate
There are additional logs in jobs and actions that are stored in base64 blobs in the database. From SP8 we’ve added a new log, the execution log, which now stores messages from the runtime logged with uInfo/uWarning/uError.
Job and action logs, mc_logs
This contains the logs of all jobs and actions as well as other useful values. Some columns I find useful are:
|JobId||Id of the job. An action is linked to a jobconfiguration on mxp_tasks.jobguid = mc_jobs.guid|
|TimeUsed||The number of seconds the action/job used to complete for this logentry|
|TotalEntries||The total number of entries processed in this logentry|
|Num_Adds||Number of add operations performed|
|Num_Mods||Number of modify operations performed|
|Num_Del||Number of delete operations performed|
|Num_Warnings||Number of warnings reported|
|Num_Errors||Number of errors reported|
Views: mcv_logall, mcv_logwarn, mcv_logerr
One of the things this can be used for is to calculate how many entries per second an action/job processes.
select jobname,JobId,sum(TotalEntries) totalEntries,sum(TimeUsed) totalTime, round(cast(sum(TotalEntries) as float)/cast(sum(TimeUsed) as float),2) entriesPerSecond from mcv_logall group by jobname,jobid order by round(cast(sum(TotalEntries) as float)/cast(sum(TimeUsed) as float),2) asc
select jobname,JobId,sum(TotalEntries) totalEntries,sum(TimeUsed) totalTime ,Round(sum(TotalEntries) /sum(TimeUsed),2) entriesPerSecond from mcv_logall group by jobname,jobid order by entriesPerSecond
This can give valuable hints about actions or jobs that are slow and will cause problems at some point in time. In this case my “Test something true: False” task is slow and needs a look. You can also reverse this by calculating totalTime/totalEntries to get time used per entry. This can be used in combination with the threshold log when running mass-update/performance tests in dev/qa cycles to detect potential issues before they cause downtime in production.
This is a new log that has been in hiding for a while as it needed a UI. It still doesn’t have one outside the MMC, but it is very useful. This log contains all the messages from the runtimes that would usually be locked inside the big blob of mc_logs or the dse.log file on the file system. So in short, this means that messages like this one:
Are now also logged individually and linkable to the user and auditids. My root audit was 1316231 so this query will find all related audits and list runtime messages reported during the processing of these audits:
select mcAuditId,mcTaskId,mcTaskName,mcMskey,mcMsg,mcLogLevel,mcLogTime From mcv_executionlog_list where mcAuditId in (select auditid from mxp_audit where AuditRoot = 1316231) order by mcUniqueId
This output would usually would be “hidden” inside the logfiles associated with each individual job:
There is a lot more to the execution-log though, so have a look at it when you get your hands on a version supporting it.
Pulling it all together
To summarize there is
- One audit per task executed on an entry (mxp_Audit)
- One extended audit entry per sub-task (mxp_ext_audit)
- 0 to many execution log entries per action (mcv_executionlog_list)
- One extended audit entry per sub-task (mxp_ext_audit)
And combining all this information can be done using joins and unions:
select AT.NAme,T.taskname taskname, EA.aud_ref auditid, ea.aud_datetime logtime, '' loglevel, ea.Aud_Info info, ea.Aud_StartedBy startedby from mxp_tasks T, mxp_actiontype AT, MXP_Ext_Audit EA where T.taskid=EA.Aud_task and T.actiontype = AT.actType and EA.Aud_ref in (select auditid from mxp_audit where AuditRoot = 1316231) union select 'Action' as type, mcTaskName taskname,mcAuditId auditid,mcLogTime logtime, case When mcLogLevel = 0 then 'Info' when mcLogLevel = 1 then 'Warning' when mcLogLevel = 2 then 'Error' else cast (mcLogLevel as varchar) end loglevel,mcMsg info ,'' startedby From mcv_executionlog_list where mcAuditId in (select auditid from mxp_audit where AuditRoot = 1316231) order by logtime
Providing a complete view like this:
And that I believe is all the detail one could look for on the processing of a specific task, “dispatcher test #1.0.0” through two child tasks and back, for a user all in a single view. I’m sure there’ll be an admin UI for this later, but for now I expect this to be most useful in the development and QA cycle.