Policy Management with SAP Data Intelligence
Although the security setup and policy implementation feels something like a bookkeeper’s or caretaker’s job it is necessary, even vital for any application. If you recall the role that the caretaker played in your school, you have to admit that she was the most important person in the organisation. Even more important than the head of school. I have learnt that analysing the policy setup could be a lot of fun and defining strategies needs some thinking and creativity. So let’s emerge into the fundamentals of the SAP Data Intelligence Policy Management and User Security first before moving on to analysing the setup. At the end you know how to implement a user security and get some ideas how to do some in depth analysis. Finally I will give you examples how to setup a reasonable policy structure.
The User Security is based on the assigned policies. A user can have many policies as you see in the screenshot. There are no user roles or groups for setting the security. This is all handled via the policies where you have all the flexibility you might need.
The most fundamental items in the Policy Management are the resources as they are named in SAP Data Intelligence.
A resource consists of a resource type and their attributes which depend on the resource type, e.g.
- application with activity and name of application
- connection with activity and identifier of connection
Currently we have 19 resource types and most probably this number will expand with each release.
These resources are grouped under a policy id or in other words a policy is defined by a number of resources. Some policies are very basic and contain only one service element, e.g.
- Policy ID: sap.dh.modelerUI.start
and some policies are more complex like Policy ID: sap.dh.member with 26 resources.
For building a more elaborate policy structure, policies can inherit resources form other policies. There are even policies that have only inherited resources like ‘sap.dh.admin’.
You have to tag a policy as “exposed” to assigned it directly to a user. Non-exposed policies needs to be nested in another policy.
That simple it is. Now you have to learn what kind of resources you have at hand to impose your ideas of user security on SAP Data Intelligence, then defining them in addition to the pre-defined ones and finally put these into a kind of policy hierarchy that helps you keeping it under control.
Avowedly the resulting policy framework could be rather complex and the list structure of the System Management UI is not helping you very much in your research in finding loopholes or redundancies. There are considerations what kind of standard tools should be added but this is still in its beginnings because the security has got recently a lot of additional features.
Fortunately SAP Data Intelligence is quite open in regards of APIs and provides tools for exporting system information. In our case of system management vctl (System Management Command-Line Client) is our silver bullet.
Command-line Access to Policy Data
After its humble beginnings vctl is now a terrific tool that helps System Administrators to automate their daily tasks by integrating them into shell-scripts. It is definitely worthwhile to have a look to it after each new release. For a short introduction I recommend reading Community blogs like the one of Gianluca de Lorenzo: Zen and the Art of SAP Data Intelligence. Episode 3: vctl, the hidden pearl you must know.
For reading the policy details we need in addition to the login vctl call:
- Get list of policies:
vctl policy list-policies --format json
- Get details for each policy:
vctl policy get <policy Id> --format json
These I have wrapped into a python script that finally gives me all the policies with its resources and a list of policies from which additional resources are inherited. Unfortunately you not get a ‘flat’ list of all resources of each policy including the inherited resources. These you have to compute in a second step.
The script “dipolicy” described at the end of the blog let’s you do it.
Analysing Policy Dependency
The first question is how all the policies depend on each other. A lot of standard policies neither inherit from or pass resources to other policies. For our demo system with only a couple of additional policies 13 policies out of 181 have no dependency, e.g.
|Policy ID||Resource Type||Activity|
Most of the policies are building blocks for other policies. To visualize these I have adjusted the plotting functions of networkx that is using matplotlib. The following picture only shows policies that are dependent from one another.
The numbering of the nodes refer to policies as you see in the table that is saved separately as ‘chart_policy_ids.csv’ by the dipolicy-script.
|Num ID||PolicyId||Num ID||PolicyId|
All policies that have no connection to the root node, are the “non-exposed” policies that could only be used as part of other policies. There is no “root”-policy defined in the SAP Data Intelligence Policy Management but it helps when visualising the graph.
At first glance you see only 3 levels for the pre-defined policies. SAP Data Intelligence used productively will have at least 1 or 2 levels more when assigning these policies to user groups.
Before we come to the best strategy of building a policy structure I like to introduce an additional classification that I found most helpful and explains the colour coding of the graph.
Additional Resource and Policy Classification
With SAP Data Intelligence 2107 we have 18 resource types on the one hand and on the other I have basic user roles and usage constraints, like administrator, business user, applications and data. When doing a mapping of the rather specific resource types to a few basic classes it might help to get a better overview about the policies. My proposal is the following
For a policy classification I use the resource classification. If all contained resources are of the same class then the policy adopts the class. If a policy has resources from more than one class it gets the classification: multiple.
This explains now the colour coding of the graph and you see that the policies are quite uniform based on the chosen classification. Mostly policies that define a role have multiple resource types what absolutely make sense. This proves that predefined policies have been carefully designed.
Best Practise of Policy Management
After the analysis of the pre-defined policy we can already derive a kind of best practise. Before doing that let me emphasise that having no user group concept in place is not a shortfall but has its reasons for having more transparency. The SAP Data Intelligence Policy Management is the only single source for defining the authorisations of its applications and managed data. There you can define roles and groups, check on consistency and reasonability and finally assign these to users. With a separate user-group hierarchy you have two chances for unwanted results.
With the policy management you can fulfil 3 targets by assigning
- Application Security
- only eligible applications to user/groups
- authorisations to user/groups for reading and changing metadata of data
- Data Security
- authorisation of reading and changing data
- Admin Security
- authorisation for system administration
The last role I will not cover because there is a pre-defined admin-policy that is mostly used unchanged. If there is a need for adjustments it is quite straight forward without a complex strategy behind.
There are predefined policies that are intended to define a role like sap.dh.member or sap.dh.developer. Admittedly it is a very rough classification and should enable customers to assign quickly roles/policies without too many exchanges with the system administrator about missing authorisations. Once SAP Data Intelligence is used productively having more divers users then you should spend some time what roles your company has and how to define the policies accordingly.
My proposed strategy is to use the nested or inheriting feature of policy management extensively. This means firstly start with a policy that every user needs. This corresponds to the predefined role of sap.dh.member but with much less policies. The minimum list of policies that let a user start the launchpad of SAP Data Intelligence without any additional application is:
Additionally you need some more applications to ensure that the basic processes work:
These I group into my most basic policy “mycompany.basic.member”. I add no additional resources to the nested ones. In anticipation of what I will later outline in more detail you could either embed this basic policy into every subsequent, kind of next level policy, or you add it only to policies corresponding to a user group or you assign this to each user directly.
As a next step I like to create a user that has only the right to start the metadata. I am not adding the predefined “app.datahub-app-data.metadataUser” because it encompasses the policy “app.datahub-app-data.dependencies” that has resources for starting Applications like the Connection Management that I want to avoid on this level. Therefore I create my next policy “mycompany.basic.metadata” with the nested policy “mycompany.basic.member” added previously. The resources I added have only the basic activity attribute “read”.
With this we enter the next level of security.
It might seem to be a bit over-sophisticated to distinguish between policies for starting applications with basic reading authorisation and modifying metadata but for me it helps to sort this out in the first approach.
As a next step I like to define 3 additional roles with more responsibilities:
- Catalog manager who can publish´and annotate datasets and manage tag hierarchies
- Glossary manager who can define new glossaries
- Catalog admin who owes the rights of the 2 previous roles
My recommendation is that for each role a user is created in order to test if the imposed security level actually fits the intended one.
The “mycompany.manage.catalog” resources inherit resources from the previously defined policy “mycompany.basic.metadata” and I am adding the resources:
for the glossary I define a similar policy
If I like to have groups that correspond to a policy then you can either choose a policy that you have created already or combining several policies into one policy without any further resources. E.g. you like to have a group whose members are allowed to manage the metadata of the catalog and the glossary then you can create an additional one
that inherits its resources from mycompany.manage.catalog and mycompany.manage.glossary.
Strategies for Setting up Application and Metadata Policies
Now that we have seen how to setup policies we have to think about the general strategy of how the policies should be structured.
I found 3 different strategies that might help you to define your own:
- User Tailored
- Flat Group
- Hierarchy Group
In the “user-tailored” strategy you define basic policies and assign them to users.
This is a very flexible approach because you can if necessary define ad-hoc policies without any major impact on the whole policy framework. What you finally get is rather unstructured and therefore mostly useful to smaller user groups. Furthermore you need to analyse both policy management and user assignments to get the full picture of your security settings.
To overcome the latter challenge you can collect the rather flat policies into a “role”-policy that you then assign to users.
This provides you a singe source of truth for your policy and user assignment and helps you to design very specific user profiles or user groups while retaining the fore-mentioned flexibility. The downside is the lack of an intuitive structure. Therefore this strategy is rather appropriate for a homogeneous and limited number of user profiles.
For a more structured approach you can impose a hierarchy on the policies by extensively using the inheritance feature.
This is a rather intuitive approach where you have all the definitions in one place. The downside is that there might be redundant resources in the final policies that actually do not harm but it could obscure some unwanted assignments when inheriting multiple levels.
My preferred approach is to have a combination of the “flat-group” and the “hierarchy-group” approach, e.g. I would assign the “basic.member”-policy directly to the final policy and keep the “metadata”-hierarchy. This would result in limited redundancy of a specific part.
For testing purpose I always start with assigning the policies to a test user directly and see if the security works as intended before creating a new policy that combines all this policies. By the way I leave this test user as a template to always been able to check again and adjust the security of a user/policy group.
Although I am advocating for reflecting user groups in the policies, I propose having separate policies for data and assigning the “application”-policies and “data”-policies separately to user. This is an additional step where you can check if the right users really get the right authorisations. Leaks in ‘data’-security could cause much more harm than the application security.
Data Policy Management
Before we go into the details of the data related policies some preliminary words about the authorisation principle regarding data sources. SAP Data Intelligence is using the authorisations of data sources and the credentials provided to establish a connection. E.g. the user of HANA Database used as credentials for the connection in the Connection Management would define the authorisations. Only the “read” and “write” attribute in the resource definition for the resource type “connection” adds a restriction. The additional activity attributes “read/writeOwner” refers to the connections that the user has created by herself. This attribute makes only sense when wildcarding the name of the connection.
There are plans to tighter integrate SAP Data Intelligence with Identity Management service. On the roadmap for release 2113 there is a mapping planned of SAML authorisations to policies. Documented in a backlog item we like to propagate the security further to the data sources. This means much more effort because the security API for each supported data source needs to be addresses. Due to missing standardisation it means a lot of development effort.
To grant a user access to data sources you need to provide a “connection” policy where you either specify the connection id of the data source or as a card blanche give access to all connections with the wildcard ‘*’. Of course the latter should be avoided.
There is another resource type “connectionContent”. The notion might be a bit misleading. It is not referring to the data as content but to the metadata management in the sense e.g. creating folders or uploading data in the metadata explorer. This policy is only used by the metadata explorer and not with the Modeler. In particular this policy is important when using the data preparation and the rulebooks.
Data Policy Examples
The structure of data policies are less complex than that of the application and metadata policies, for good reasons because the impact on the security could be much more severe. Your most probably will have different business areas which data should be accessible by different groups of employees with different responsibilities. For the following let’s assume you have the following data security levels with attached data sources
- Users Own Data Sources
Users who can add “connections” to the Connection Management can access the data and the metadata without any restrictions
- Basic Shared Data Sources
Read and write access and metadata management for data sources of non-confidential data
- Info Data Sources
Read access only
- Manage Info Data Sources
Write and manage authorisation of Info data sources
These examples should provide the patterns that enables you to extend the data policies to the needs of our company.
|User’s Own||*.basic.connections||sap.dh.connectionContentOwnerManage – connectionContent – ownerManage – *
sap.dh.connectionsOwnerRead – connection – ownerRead – *
sap.dh.connectionsOwnerWrite – connection – ownerWrite – *
|Basic Shared||*.shared.connections||connection – read – HANA_DEMO
connection – write – DI_DATA_LAKE
connection – write – HANA_DEMO
connection – read – DI_DATA_LAKE
connectionContent – manage – DI_DATA_LAKE
connectionContent – manage – HANA_DEMO
|Info||*.info.connections||connection – read- HANA_QM|
|Manage Info||*.info_manage.connections||connection – write – HANA_QM
connectionContent – manage – HANA_QMinherited:
mycompany.info.connections – connection – read – HANA_QM
For each data policy resource you need to specify a connection. The exception is the wildcard ‘*’ that refers to all connections and should of course used very limited.
The inheritance feature is of limited use for data policies. For the same group of data you only have the three levels of read, write and manage that could be used as a vertical order.
I hope this helps you when designing a user security on your SAP Data Intelligence system. Please watch out for each release what is new for policy management. You can expect new features that support system administrators in their daily business but also a tighter integration with Identity provider systems.
For this blog I developed a script that enables you to download and to upload policies, and does the analysis shown in this blog. You can install the package by
pip install diadmin
and run it with “dipolicy”:
dipolicy --help usage: dipolicy [-h] [-c CONFIG] [-g] [-d DOWNLOAD] [-u UPLOAD] [-f FILE] [-a] Policy utility script for SAP Data Intelligence. Pre-requiste: vctl. optional arguments: -h, --help show this help message and exit -c CONFIG, --config CONFIG Specifies yaml-config file -g, --generate Generates config.yaml file -d DOWNLOAD, --download DOWNLOAD Download specified policy. If 'all' then all policies are download -u UPLOAD, --upload UPLOAD Upload new policy. -f FILE, --file FILE File to analyse policy structure. If not given all policies are newly downloaded. -a, --analyse Analyses the policy structure. Resource list is saved as 'resources.csv'.
You can download the example policies from my personal GitHub policies folder.
So have fun and be not too strict with your user.
There is now a follow-up blog that outlines a blueprint policy structure.