IDM SQL Basics #2: Locating problem queries
Entries marked TBD (To Be Done) will be expanded later.
Update 25.07.2014: As Matt outlines the situation in The Future of SAP IDM, IdM is moving to Sofia, and our Labs team are moving on to other ventures. It is very unlikely that I will update these blog entries any further. If someone else here in SDN wants to take over and/or use parts of this in improved blog entries you are most welcome to do so. More might be available in raw at Queries and useful(?) stuff, infodump
This is part 2 of the “IdM SQL Basics” blog posts.
#1: Overview of IDStore views, tables, queries & basic examples and some examples of performance impacts => Here
#2: How to locate problem queries in your implementation => You’re here!
#3: Testing and improving queries SQL Server, Oracle, DB2 => Here
For the purpose of this document a problem query is one that takes a long time to complete, not one that doesnt work. And depending on where the query is used a long time can be measured in milliseconds, seconds, or minutes. There are a couple of built in tools to help you locate problems and problem queries that take an excessive amount to finish and causes bottlenecks as well as queries and tools that we (or I) use during development and support sessions. I’ll also try to address which problem situations each method can be used to troubleshoot.
The tools I’ll try to cover are
- Built in execution threshold logging
- Database queries
- Database tools (Activity Monitor, SQL Profiler, Enterprise Manager, Solution Manager)
- JMX Developer Trace
1. Finding problems using execution threshold logging
I’m starting with this most underused gem, the Execution Threshold Log. This apparently little known feature is built into the IdM product and logs executions of queries or procedures that exceed a specified amount of time. The specified amount of time is set using the global constant MX_LOG_EXEC_THRESHOLD or in the admin UI. This constant should already exist on your 7.2 system and have a fairly large number assigned to it, but if its not there you can create it manually. An important thing to keep in mind is that the number entered here is milliseconds. So entering 1 will cause large amounts of useless logs to be generated.
Events logged by this function are logged to the MC_EXEC_STAT table, but you should use the IDMV_EXEC_STAT view to look at the content as it gives some columns text-values that improve readability. You can also see the content of this in the admin UI.The function will log most operations done by the DSE RunTime, Dispatcher, UI and most of the database procedures they use. This means that slow uSelect operations, access controls, dispatcher job allocation and a host of other problems can be found by this function.
This view will contain a row of information for each event that exceeded the set limit. It contains a lot of useful information and columns that you can look at. What I usually do is run:
select datetime,componentname,mcExecTime,jobname,taskname,mskeyvalue,mcStatement from idmv_exec_stat order by mcExecTime desc
There are more columns in the view so please check it out. Here’s a brief overview of the columns listed in my example:
- MCEXECTIME – how many milliseconds the operation took to perform
- COMPONENTNAME – where the threshold was exceeded (procedure, dispatcher, runtime …)
- JOBNAME/TASKNAME – If available they contain the name of the job or task
- MSKEYVALUE – the user that the task was running for if available (not in jobs for instance)
- MCSTATEMENT – Content depends on what type of event is logged. Can be a procedure name w. parameters, a query from conditional or switch tasks or something quite different
In any case, my simple example shows that I have a switch task statement that takes 6 to 12.5 seconds to execute. This task processes less than 5 entries a minute in worst case and would be a bottleneck in any scenario. You can also see some additional messages about procedures taking time, but since I know that mcProcUserReconcile is a housekeeping task and that it processed 1000 entries in the reported time so I wont spend time on them. Unfortunately there are not any built in “this is normal” tags but you can ask here or through support if something looks out of whack with any product procedures. Once you know the name of the task, job or attribute that’s causing you problems it should be fairly simple to locate it and try to fix the problem (see the other parts of this blog series).
The Admin UI allows configuration of the threshold and contains a logviewer that shows the table contents in the “Statement Execution” tab:
As summary the execution threshold log can and should be used during development and in mass-update/load tests in the QA environments to find problem areas early. Just keep in mind that this table is not emptied automatically (yet) so if you start using it please maintain it too. When the test is done and increase the threshold value to avoid unecessary logging while in production mode.
2. Finding problems using database queries
This section covers items that can also be found using database tools which is covered in the next section. If you’ve had an IdM support ticket open regarding performance and ended up talking with the team in Norway you might already know that we often dive into SQL Developer/SQL Server Management Studio and start running queries to get to the issue quickly. The queries we run depend highly on the problem we’re trying to solve, but I’ll try to give some usecases for them.
The biggest downside to these is that usually require more access than what the OPER account has so you’ll have to find the system/sa password or ask a DBA to run them for you. In a development or QA environment I believe you should have access to this as well as DB tools to analyze performance during and after performance tests anyway.
These queries are useful when you have a hang-situation and suspect queries are blocked at the database level.
To list queries that are currently being executed on SQL Server you can use the following query:
SELECT database_id,st.text, r.session_id, r.status, r.command, r.cpu_time,r.total_elapsed_time FROM sys.dm_exec_requests r CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st order by database_id
The SQL Server Activity Monitor will show the last completed query which will not help if you’re wondering why nothing is happening at the moment.
To get the name for a database_id or ID of a database name in SQL Server use:
SELECT DB_ID ('MXMC_db') -- This is case sensitive if the DB is installed with CP850_BIN2 collation SELECT DB_NAME(10) -- 10 being the database ID you want the name of...
To produce a listing of historical queries that have taken more than 1000ms to execute on average you can run the following:
SELECT last_execution_time,total_physical_reads,total_logical_reads,total_logical_writes, execution_count,total_worker_time,total_elapsed_time, total_elapsed_time / execution_count avg_elapsed_time,SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,((CASE statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2) + 1) AS statement_text FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st WHERE total_elapsed_time / execution_count > 1000 -- and execution_count > 10 ORDER BY total_elapsed_time / execution_count DESC;
The substring function picks out the “problem” code of procedures, and by uncommenting the line “and execution_count > X” you can also filter out statements run very seldom. Likewise you can also find often executed queries that are problematic by setting execution_count very high while lowering the total_elapsed_time/execution_count requirement. Also see Lamberts version of this statement in the comments section.
This can be very helpful to run after a performance test has completed. Before the test you can clear the statistics by running DBCC FREEPROCCACHE if youwant to clear out potentially stale data, but make sure to have a warmup time to repopulate caches in the test.
To list current active/inactive sessions for a schema, in my case my prefix is MVIEWS so the screenshot does not match 100% of the text, run:
SELECT sess.process, sess.status, sess.username, sess.schemaname,sess.wait_time,sess.sql_exec_start,sess.blocking_session,sql.sql_text FROM v$session sess, v$sql sql WHERE sql.sql_id(+) = sess.sql_id AND sess.type = 'USER' and schemaname LIKE 'MXMC%' order by status
This will give a list of sessions for the schema and show what query they’re currently struggling with if any. This shows my problematic switch task in action:
If you only want to list active sessions and get the full SQL Text and SQL Ids you can try this variation of the query against the same two views as before:
select sess.USERNAME, sess.sid, sqlt.sql_id, sqlt.sql_text from v$sqltext_with_newlines sqlt, V$SESSION sess where sqlt.address = sess.sql_address and sqlt.hash_value = sess.sql_hash_value and sess.status = 'ACTIVE' and sess.username like 'MXMC%' order by sess.sid,sqlt.piece
To find queries that on average are slow running you can use a number of different queries. I’ve settled on variations of this:
SELECT cast(ROUND(A.ELAPSED_TIME / A.EXECUTIONS / 1000000) as number(10)) SQL_AVG, cast(A.SQL_ID as varchar(20)) SQL_ID, cast(A.SQL_FULLTEXT as varchar(300)) SQL_TEXT, cast(A.EXECUTIONS as number(10)) EXECCOUNT, cast(NVL(S.PROGRAM,A.MODULE) as varchar(20)) PNAME, cast(NVL(S.USERNAME,A.PARSING_SCHEMA_NAME) as varchar(20)) USERID FROM V$SQLAREA A, V$SESSION S WHERE A.SQL_ID = S.SQL_ID(+) AND cast(NVL(S.USERNAME,A.PARSING_SCHEMA_NAME) as varchar(20)) like 'MXMC%' -- UserId / schemaname, adapt if required AND A.EXECUTIONS > 0 -- Can also be useful to set this value higher to find those executed often and running slow AND ROUND(A.ELAPSED_TIME / A.EXECUTIONS / 1000000) > 10 -- Executiontime > 10seconds increase/decrease as needed
This query lists sessions that are blocking other sessions
SELECT t.status, t.start_time,sess.username,sess.sid,sess.machine,sess.process,sess.blocking_session,sess.status, sess.username, sess.schemaname,sess.wait_time,sess.sql_exec_start,sql.sql_text FROM v$session sess, v$sql sql , V$TRANSACTION t WHERE sql.sql_id(+) = sess.sql_id and t.ses_addr = sess.saddr and sess.sid in (SELECT sess.blocking_session FROM v$session sess WHERE sess.blocking_session is not null and sess.type = 'USER' and schemaname LIKE 'MXMC%')
There’s a lot of columns on V$SQLAREA og V$SESSION for the curious. This tries to list those queries that on average has used more than 10 seconds per execution.
With the SQL ID you can do fun stuff like getting the exexution plan like this (if you have DBMS_SQLTUNE available):
select DBMS_SQLTUNE.REPORT_SQL_MONITOR(type=>'HTML',report_level=>'ALL',sql_id=>'1zf0h9sw688bx') as report FROM dual;
If you put the result of the query in a file named .html and you have a nicely formatted executionplan:
Exactly what to read from this we will try to get back to in part 3.
TBD See introduction section
3. Database tools
Both SQL Server and Oracle provide monitoring tools that can help troubleshoot a hang situation. In your development and QA environment it can be very beneficial to have access to these if you need to fine-tune or troubleshoot your implementation. They usually provide reports of queries that are running slow, most frequently and having the highest costs. This is by no means a full guide to using these tools, but rather a quick introduction.
One common issue that I’ve experienced with all these tools is that they do not always show the current queries being executed, but rather show the lastcompleted query. This is often not very helpful and one of the reasons the queries listed in section 2 are preferable at times.
SQL Server Reports
The SQL Server has built in reports that extract much of the information found by the queries mentioned above. If you right click on the database instance and select Reports you’ll find a good list to choose from. The “Object Execution Statistics” for example will give you a huge list of the executable object the server has processed for the instance. It can also be exported to Excel.
SQL Server Activity Monitor
The SQL Server Activity Monitor went through some drastic changes after the 2005 version, and I dont find it that helpful anymore. It will still list running sessions and allow you to see sessions/queries that are blocked or blocking each other.
I’ve constructed a scenario where the dispatchers are blocked by another transaction. When using the activity monitor you can sort using the Blocked By and Head Blocker to see what is currently (not) happening. First sorted by Blocked By:
And the Head Blocker:
At this time you can already start to look at the data available to get some understanding of the situation. The blocking session (75) is using the Global\I045111 login. This is not a login used by IdM but a domain login. If it was <prefix>_admin we’d know its the Management Console, the <prefix>_rt means it’s either from the runtime engines or the dispatcher, and <prefix>_prov means it’s the UI.
Its also possible that you encounter nested sessions, where session 75 is blocked by another session, and that itself is blocked by another etc. Sorting on the Head Blocker column will usually take you directly to the one causing the blockage.
You can right click on the sessions and see what the last SQL was, or you can use the query listed earlier. In this is the result when looking at session 75:
Replicating realistic scenarios is difficult, so I just made a transaction without bothering to commit it. Which happens very often if you use a DB client that does not autocommit transactions for you (SQL Developer and others on Oracle).
While you’re in the Activity Monitor you should also note the Recent Expensive Queries tab which can be very useful during testing:
SQL Server Profiler
This application allows you to do full traces of the procedures and calls to the SQL Server. This can become huge amounts of data and it will affect performance . Also it requires a lot of privileges on the server (look it up if you need to, I just use SA).
TBD See introduction section
SQL Server Deadlock Trace
This is a very useful option if you’re struggling with deadlock errors. It does not affect performance, and when a deadlock occurs the server will log involved processes and the execution stack which makes it somewhat easier (there’s seldom anything easy in regards to deadlocks) to get to the source of the problem.
As SA run:
DBCC TRACEON (1222, -1)
This will create .trc files in your SQL Server log directory (C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\LOG) when deadlocks occur.
To turn it off again, run
DBCC TRACEOFF (1222, -1)
It will also be disabled again when the SQL Server is restarted. The result is something like this example shows, where I’ve tried to illustrate where the two processes each own a lock thats blocking the other process, with a third involved just to confuse the picture. The clue lies in the “Waiter” and “Owner” ids:
More information can be found here: Deadlock trace (MS)
Oracle Enterprice Manager
TBD See introduction section
4. JMX Developer trace
The JMX Developer trace can be very helpful when troubleshooting performance problems from the WebUI/Rest interface. It logs the queries that are executed and their execution times.
TBD See introduction section
Disclaimers, excuses & comments
This was part two in a series of posts that are focused on database queries, troubleshooting and curiosities related to the SAP Identity Management Identity Stores and its contents. It is focused on tables and views as they are in the 7.2 release, but some tips apply to 7.1 implementations as well. Do note that this is not an official guide and that official docs such as helpfiles, tickets, notes etc. are the no.1 source. I’m writing this based on experiences from support calls, implementations and from working in the IdM development team. Some of this is also available in the training for IdM 7.2 and other places. I’m not a DBA so if I’m doing something very backwards or completely wrong here, please let me know so I can improve the document.
Feel free to correct me, ask for additional examples and clarifications as I hope to keep this series updated with new information as it appears.