IDM SQL Basics #1: Queries against the Identity St...

Former Member · ‎06-12-2013

This is part one in a series of posts that are focused on database queries, troubleshooting and curiosities related to the SAP Identity Management Identity Stores and its contents. It will be focused on tables and views as they are in the 7.2 release, but some tips apply to 7.1 implementations as well. Most examples are using SQL Server but I'll try to throw in some Oracle in here as well. Do note that this is not an official guide and that official docs such as helpfiles, tickets, notes etc. are the no.1 source. I'm writing this based on experiences from support calls, implementations and from working in the IdM development team on the database schema.

Feel free to correct me, ask for additional examples and clarifications as I hope to keep this series updated with new information as it appears.

Planned entries:

Part #1: Overview of IDStore views, tables, queries & basic examples and some examples of performance impacts => this is it

Part #2: How to locate problem queries in your implementation => Here

Part #3: Testing and improving queries => Here

Part one starts with the very basic - Why Should You Care? Then focuses on some useful views and some queries that shows data from them, and some examples on how to use them.

Queries, if they work already why should you care?

There are queries everywhere in a typical IdM implementation, so what if a few of them are slow? Why should you spend additional time on custom scripts using a uSelect call? Typically once a query is written its ignored until something takes a long time to list in the UI, or times out, or it turns out that a go-live process will not finish in the allotted weekend because an initial load job isn't finishing in time, or the entries are not processed or provisioned in time.

If you have a conditional task, switch task or scripted uSelect call that takes 200ms to execute it will in best case be able to process 5 entries per second and effectively block whatever workflow events sit behind it. It gets even worse if this needs to be executed for many repositories. So when you're facing go-live or adding another application to the solution that brings in another 20.000 roles and hundreds of thousands of assignments this could quickly becomes a bottleneck you never saw during development and might not see during daily usage of the solution.

Views you should familiarize yourself with

Knowing where the data is available is crucial, and knowing a little of the data structure is good too. The frequently used views have variations around a base name (in bold) which indicate what kind of data the view has, and an extension to indicate if they contain active, inactive or active & inactive values and if they are basic/simple views that contain only references for link values or extended views that link in the referenced value mskeyvalues or displaynames. Extended information about the views are available in the help file and the training documentation.

idmv_entry_simple/idmv_entry_simple_all/idmv_entry_simple_inactive
- Contains one row per entry, useful when needing only MSKEY,MSKEYVALUE, DISPLAYNAME, ENTRY TYPE or ENTRY STATE or similar
idmv_value_basic
- Contains one row per attribute value per entry, only non-reference attribute values
idmv_vallink_basic/idmv_vallink_ext and other idmv_vallink_<variations>
- Similar to idmv_value views it contains one row per attribute value per entry + reference values (MXREF_MX_PRIVILEGE/ROLE, MX_MANAGER etc)
idmv_link_basic_active/idmv_link_ext/idmv_link_simple_active and other idmv_link_<variations>
- Contains only reference information, such as person to role/privilege/manager assignments.

These views do various amount of joining of data from the tables of the system, the most important ones being mxi_values, mxi_attributes, mxi_entry and mxi_link. There's rarely any reason to access these tables directly and they are usually not accessible for the runtime accounts anyway. If you're not interested in the underlying tables you can jump down to "A few basic IdM SQL Query guidelines" from here.

MXI_VALUES

contains the non-reference values (used by value & vallink views) and some of the more interesting columns are displayed in this picture:

From support calls show that the two following facts are either not known or quite frequently ignored:

AVALUE
- This is the value of the attribute as entered, case & all intact, and is used when displaying the values
- It has a maximum length of 2000 characters. Values that are larger than 2000 characters (such as pictures) are stored in the ALONG column
- This column is usually named AVALUE or MCVALUE in views
- THE AVALUE COLUMN IS NOT INDEXED
SEARCHVALUE
- This is an uppercased copy of the first 400 characters of the contents in AVALUE
- This column usually named SEARCHVALUE or MCSEARCHVALUE in views
- THE SEARCHVALUE COLUMN IS INDEXED

MXI_ENTRY

Contains a single row per entry and is available through the views starting with idmv_entry_, and it holds some key information such as entrytype, entrystate, name, displayname, idstore, changenumber for easy and quick access. There are quite a few other columns as well but these are most commonly used.

In the 7.1 schema you had to do multiple joins to get this basic information about an entry, in 7.2 this table is a sort of meta-table for entries which is really useful to be aware of.

MXI_ATTRIBUTES

This table contain the attribute definitions, and all views that shows an attribute name has a join with this table. This table usually doesn't cause any problems and you rarely have any need to access it. Some of the most used/interesting colums:

MXI_LINK

The MXI_LINK table is another new table in IdM 7.2. This contains all links in the system. A link is any reference between entries such as manager, role/privilege assignments, role/privilege hierarchy assignments etc. Some key columns are shown here:

It also has support tables & views that are interesting. MXI_LINK_AUDIT for instance contains the full history of any link, when it was initiated, its complete approval history etc. This information is available through the idmv_linkaudit_basic and idmv_linkaudit_ext views. As in IdM 7.1, references also show as attributes set on the user such as MX_MANAGER, MXREF_MX_PRIVILEGE and MXREF_MX_ROLE but this is done by views that join the MXI_VALUES and MXI_LINK tables. The MXI_VALUES table itself no longer contain these references.

A few basic IdM SQL Query guidelines

You should keep the following in mind when writing custom queries:

mcSEARCHVALUE and SEARCHVALUE is the same column. mcVALUE and AVALUE is the same column
The column named mcSEARCHVALUE is named as such because it is the column which you should search in
Don't use MCVALUE (or AVALUE) = ' <something>', or worse MCVALUE (or AVALUE)like '%<something>%' or MCVALUE (or AVALUE) in (...) as any part of your queries
See 1,2 & 3 a few more times
Try to use the simplest most basic view available that gives you the information you need, use idmv_entry views when possible
When using attributes with unique constraint there's no need for DISTINCT

Examples

Here are some common queries against the different views. You should try to use the simplest view possible when creating queries that are used in conditionals/switches or scripts.

idmv_entry_simple, listing all entries of a specific entrytype

select mcMSKEY,mcMskeyValue, mcDisplayName from idmv_entry_simple where mcEntryType='MX_PERSON'

idmv_vallink_basic, list all attribute values for a specific user with its mcmskey selected from idmv_entry_simple

select mcMSKEY,mcAttrName,mcValue from idmv_vallink_basic where mcMSKEY = (select mcmskey from idmv_entry_simple where mcMSKEYVALUE = 'USER.BLOG.5')

idmv_vallink_basic, list all attribute values for a group of users where mcmskeys are selected from idmv_entry_simple

select mcMSKEY,mcAttrName,mcValue from idmv_vallink_basic where mcMSKEY in (select mcmskey from idmv_entry_simple where mcMSKEYVALUE like 'USER.BLOG%')

idmv_vallink_basic, list all attribute values for a group of users where mcmskeys are selected from idmv_vallink_basic

select mcMSKEY,mcAttrName,mcValue from idmv_vallink_basic where mcMSKEY in (select mcmskey from idmv_vallink_basic where mcAttrName = 'MX_LASTNAME' and mcSearchValue like 'BLOG%')

Example performance impact from bad queries

With the data structure of IdM its possible to write queries in many ways and still produce the same/valid results. Part 3 will go into how I did these measurements. A common scenario we see in support is a query that lists all attribute & values for a subset of entries shown as #1, where #2 lists a less expensive version, the bold text highlights the difference:

select mcMSKEY,mcAttrName,mcValue from idmv_vallink_basic where mcMSKEY in (select mcmskey from idmv_vallink_basic where mcAttrName = 'MSKEYVALUE' and mcSearchValue like 'USER.BLOG%')

Uses the idmv_vallink_basic view to find entries attribute named MSKEYVALUE and value like USER_BLOG%
idmv_vallink_basic uses the MXI_VALUES table, has hundreds of thousands of rows in my test system

select mcMSKEY,mcAttrName,mcValue from idmv_vallink_basic where mcMSKEY in (select mcmskey from idmv_entry_simple where mcMSKEYVALUE like 'USER.BLOG%')

Uses idmv_entry_simple to look for entries with mcMSKEYVALUE like USER_BLOG%
Idmv_entry_simple has one row per entry, total 10.000 rows in my test system

Both queries return the exact same results, but the load on the database server is very different. Even in a small environment with only 10.000 entries entries the load on the server for query #2 is ~2% that of query #1, and that's just by switching to a view representing the MXI_ENTRY table which is simply much less data and indexes to look through.

Really bad example

Just to finish that off, here's a quick comparison of happens when you break point 1 to 3 of the basic IdM SQL query guidelines where query 2 is the bad version of query 1:

select mcMSKEY,mcAttrName,mcValue from idmv_vallink_basic where mcMSKEY in (select mcmskey from idmv_entry_simple where mcMSKEYVALUE like 'USER.BLOG%')

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'mxi_entry'. Scan count 2, logical reads 4, physical reads 0.
SQL Server Execution Times: CPU time = 15 ms, elapsed time = 109 ms.

select mcMSKEY,mcAttrName,mcValue from idmv_vallink_basic where mcMSKEY in (select mcmskey from idmv_vallink_basic where mcAttrName = 'MSKEYVALUE' and mcValue like 'USER.BLOG%')
-- VIOLATION OF THE BASIC IdM SQL GUIDELINES #1,#2 & #3, DO NOT DO THIS PLEASE

Table 'Worktable'. Scan count 0, logical reads 0, physical reads
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0Table 'mxi_link'. Scan count 2, logical reads 6, physical reads 0
Table 'MXI_Attributes'. Scan count 1, logical reads 2, physical reads 0
Table 'MXI_VALUES'. Scan count 1, logical reads 1170, physical reads 0
SQL Server Execution Times: CPU time = 16 ms, elapsed time = 299 ms.

As mentioned I will go into more detail about how to perform such comparisons and what these values are in part 3 of this blog. For now just assume that large values are bad, and you want as much as possible to be Scan, and as little as possible to be Logical reads.

The 299 millisecond elapsed time for statement 2 is not bad when run all by itself, but it is 2.7 times slower than the first query. We can also see that it needs much more logical read operations to complete and needs to access two tables. By comparing execution plans, SQL Server estimates query 2 be about 100 times more expensive to execute than query 1, and we can assume it would become a much bigger strain on the system under load than the alternative, especially with an increased number of entries.

Another advantages of using idmv_entry_simple vs. idmv_vallink or idmv_value is that one entry is guaranteed not to have more than one row, so there's no need to use distinct on the resultset.

Configuration examples and other useful hints

There are many common places where you write your own queries perhaps without considering the importance of them such as; source-tabs of jobs, conditional statements, switches and scripts. I'll try to provide some examples here.

Job Source Definition, Use Identity Store Option

This first example is a job that adds an additional approver to all privileges from a specific repository BQQ:

Destination:

In this example the "Use Identity Store" option is checked on the source tab. This means that the query only needs to return a list of MSKEY's and the system will automatically retrieve the attribute values that are used in the destination pass with the %<attributename>% syntax from the list of entries in the source pass. So for each mskey in the source, it will retrieve the MSKEYVALUE attribute in this example. Keep this in mind and leave the source statement as simple as possible.

You could also do this without the Use Identity Store option by using

select mcmskeyvalue as mskeyvalue from idmv_entry_simple where mcEntryType = 'MX_PRIVILEGE' and mcMSKEYVALUE like 'PRIV:ROLE:BQQ:%'

in the source statement if the only dynamic value you need to use in the destination tab is the %MSKEYVALUE% attribute. You can offcourse return all values yourself if you want to, but please dont check the "use identity store" option in addition.

Conditional task

Conditional tasks expect 0 or 1 to be returned by the sql statement where 0=false and 1=true. Count combined with Sign is useful here as it returns 1 for any positive value and 0 for... 0, and -1 for any negative value.

Good:

select sign(count(mcMSKEY)) from idmv_entry_simple where mcmskey = %MSKEY% and mcEntryType='MX_PERSON'

Not that good:

select sign(count(MSKEY)) from idmv_value_basic where mskey = %MSKEY% and AttrName = 'MX_ENTRYTYPE' and SearchValue = 'MX_PERSON'

Again, using idmv_entry in query 1 is 1/3 the cost of using the idmv_value view in query 2 in my simple environment. Sign is not really required here though as there can only be one entry for one mskey in MXI_ENTRY and count would always return 0 or 1 unless something was really broken.

Switch task

Switch tasks expects a single value to be returned from the query it executes. Example switch on entry type using idmv_entry_simple view.

SQL SERVER - WITH (NOLOCK)

This topic really deserves its own post so I'll keep it short:

If you look through the framework and procedures you will see that there's an extended use of WITH (NOLOCK) when the solution is running on SQL Server. For a complete description and understanding of table hint I recommend your favourite search-engine, but the highlight is that the query you write will not issue shared locks for the rows its accessing and this helps to avoid deadlock situations and increases performance. The penalty is that you risk reading data that are rolled back, changed or deleted by other transactions. Most of what you write can use this without risk, but it doesn't hurt to think about what could happen if you get bad data out of your query.

If you're configuring a solution that needs to be Oracle/DB2 compatible you can use %NOLOCK% in certain locations (switch tasks & conditional):

SELECT mcvalue FROM idmv_vallink_basic %NOLOCK% WHERE mcattrname='MX_LOCATION' AND mcidstore = 1 AND mcmskey==%MSKEY%

The system will then replace %NOLOCK% with an empty string if it detects that its running on Oracle/DB2, and with WITH(NOLOCK) if its running on SQL Server.

Most of the scripts in the framework use the databasetype system constant to determine if the hint is used or not similar to this:

var nolock = "";
if("%$ddm.databasetype%" == 1) { //MS-SQL
nolock = "WITH (NOLOCK)";
}
var sql = "SELECT mcMSKEYVALUE from idmv_entry_simple "+nolock+" where searchvalue like '"+filter+"'";

That's it for part 1. Comments & corrections (even on spelling) is appreciated!

(2013-09-05 - corrected a few sqls, improved(?) formatting and highlighting)