Skip to Content
Technical Articles
Author's profile photo Ulrich Anhalt

Monitoring SAP and Hana Instances with Prometheus and Grafana

Monitoring specific problems with SAP standard tools is not always fun and often impossible.

Prometheus is an open source monitoring solution and Grafana a tool for creating dashboards to visualize the data. The Cloud Native Computing Foundation accepted Prometheus as its second incubated project, after Kubernetes and it is in use by many known companies.

In combination with a few Prometheus exporters it is possible to monitor and alert a wide range of problems regarding SAP with a uniform concept.

Installation

All involved programms can be run as binaries, docker containers or as part of a kubernetes cluster. The installation goes beyond this blog post but there are a lot of interesting articles and even some books covering this theme.

Standard monitoring

For server monitoring the Prometheus node_exporter (Linux) and wmi_exporter (Windows) are available. The blackbox exporter on the other hand allows blackbox probing of endpoints over HTTP and HTTPS or for example to get alerted when an SSL certificate expires.

Beside these examples a lot of other exporters are available, that can be integrated into the monitoring landscape. For alerting purposes Prometheus provides with the alertmanager a lot of configuration options.

SAP specific monitoring

For SAP specific monitoring the hana_sql_exporter and sapnwrfc_exporter come to play. Their installation and usage is described in the corresponding Github repository readmes.

As the name suggests, with the hana_sql_exporter a sql select is responsible for the data retrieval. By definition the first column must represent the value of the metric. The following columns are used as labels and must be string values. In this way, all tables are available to create the needed metrics for the existing problems.

The sapnwrfc_exporter on the other hand is an addition to solve problems, that cannot be solved with the hana_sql_exporter alone. For example the actual count of the lock table entries or the current number of dialog-processes belong in this category.

Both exporters can be used as a binary, docker container or as pod in a kubernetes cluster. They read the relevant system- and metric-information from a TOML configfile, as described in the Github repositories. It is possible to run as many exporter instances as needed. For example they can be structured by different metric categories or by system usage.

In the Prometheus configfile the exporter instances can be inserted in a separate job section:

- job_name: hana-short
      scrape_interval: 60s
      static_configs:
        - targets: ['172.45.111.105:9658'] 
          labels: {'instance': 'hana_exporter_tst'}
        - targets: ['hana-exporter-dev.sap.svc.cluster.local:9658']
          labels: {'instance': 'hana_exporter_dev'}
          ...

 

Hana backups

The first example shows how SAP Hana backups can be monitored. In this case the hana_sql_exporter config entry for the metric looks something like this:

...
[[Metrics]]
  Name = "hdb_backup_status"
  Help = "Status of last hana backup."
  MetricType = "gauge"
  TagFilter = []
  SchemaFilter = ["sys"]
  SQL = "select (case when state_name = 'successful' then 0 when state_name = 'running' then 1 else -1 end),entry_type_name as type from <SCHEMA>.m_backup_catalog where entry_id in (select max(entry_id) from m_backup_catalog group by entry_type_name)"
...

 

A few minutes after starting the exporter, the metric results can be analyzed with the Prometheus expression browser. Instance and job are standard Prometheus labels, usage and tenant are standard hana_sql_exporter labels and type is an additional label initiated in the SQL part of this metric.

 

With the Grafana dashboard all backups can be displayed in one view. As shown in this example, every hanging backup is really obvious at once. This one has been detected around 07:20, then cancelled, started again and it finished around 07:35. Additionally it’s possible to alert such a situation with the Prometheus Alertmanager.

Here are some other examples for hana_sql_exporter metrics:

  • Oldest backup days
[[Metrics]]
  Name = "hdb_oldest_backup_days"
  Help = "Oldest Backup found in backup_catalog."
  MetricType = "gauge"
  TagFilter = []
  SchemaFilter = ["sys"]
  SQL = "SELECT DAYS_BETWEEN(MIN(SYS_START_TIME), CURRENT_TIMESTAMP) OLDEST_BACKUP_DAYS FROM <SCHEMA>.M_BACKUP_CATALOG"
  • Cancelled SAP Jobs
[[Metrics]]
  Name = "hdb_cancelled_jobs_total"
  Help = "Sap jobs with status cancelled/aborted (today)"
  MetricType = "counter"
  TagFilter = ["abap"]
  SchemaFilter = ["sapabap1", "sapabap", "sapewm"]
  SQL = "select count(*) from <SCHEMA>.tbtco where enddate=current_utcdate and status='A'"

 

DBVM problem

A more complicated problem that also can be monitored, is the following one. The change of some extensive material variants lead in some cases to hanging update processes (SM50) on tables DBVM,MA61V,DBVL and MDUP. In some rare cases this even leads to a fast increasing count of hanging update entries of many other users, which can result in a complete system stop.

The occurence of specific tables in the process-overview can be counted with the sapnwrfc_exporter and the configfile entry for the metric in this case looks like this:

...
[[TableMetrics]]
  Name = "sap_processes"
  Help = "sap process info"
  MetricType = "gauge"
  TagFilter = []
  FunctionModule = "TH_WPINFO"
  Table = "WPLIST"
  AllServers = true
  [TableMetrics.Params]
    SRVNAME = ""
  [TableMetrics.RowCount]
    WP_TABLE = ["dbvm", "dbvl", "ma61v", "mdup"]
    WP_TYP = ["dia", "btc", "upd", "upd2"]
  [TableMetrics.RowFilter]
    WP_STATUS = ["on hold", "running"]
...

 

On the other hand it is possible to count the number of entries in the update table with the hana_sql_exporter:

...
[[Metrics]]
  Name = "hdb_update_table"
  Help = "SAP update table entries"
  MetricType = "gauge"
  TagFilter = ["abap"]
  SchemaFilter = ["sapabap1", "sapabap","sapewm"]
  Sql = "SELECT count(*) FROM <SCHEMA>.VBHDR WHERE VBDATE = current_date"
...

 

The result in Grafana when a problem occurs looks like this:

With the following alert rule this can be covered and alerted through one of the receivers the alertmanager can be configured for.

alert: sap_dbvm_high
expr: sum(sap_processes{system="p01", count=~"wp_table_dbvm|wp_table_dbvl|wp_table_ma61|wp_table_mdup"}) > 0 and sum(hdb_update_table{tenant="p01"} > 5)
for: 2m
labels:
  severity: critical
annotations: 
  description: DBVM problem for more than 2 minutes.
  summary: DBVM, DBVL, MA61V or MDUP table entries in SM66 > 0 and SM13 entries > 5.

 

Thanks for reading this. I hope it was helpful.

Assigned Tags

      18 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Remi ASTIER
      Remi ASTIER

      Nice !

      Could it work with metrics such as Network traffic where values are cumulated and it is required to compute the difference between two measurements?

      Something like: (Value2-Value1)/(Date2-Date1)

      Thanks.

      Author's profile photo Ulrich Anhalt
      Ulrich Anhalt
      Blog Post Author

      Yes, a lot is possible. Here are some links:

      Author's profile photo Sebastian Richter
      Sebastian Richter

      Hello Ulrich,

      thank you for sharing this information.

      Do you know whether it's possible to call also other function modules using the sapnwrfc_exporter?

      Thank you and best regards,

      Sebastian

      Author's profile photo Ulrich Anhalt
      Ulrich Anhalt
      Blog Post Author

      Hi Sebastian,

      the only restrictions regarding the function modules are, that they need to be remote enabled and certainly they need an interesting output table or export field for a new metric. Please take a look at the examples directory of the sapnwrfc_exporter.

      Best regards

      Ulli

      Author's profile photo Úlfar Markús Ellenarson
      Úlfar Markús Ellenarson

      Hi Ulrich,

      After I read your article on configuring prometheus and grafana for monitoring sap, I checked the git repo for sapnwrfc_exporter and saw that you are the author.  I checked to see if I could use this to monitor remote systems by using a saprouter string.  I could not find that option.  This option is called ashost in sapnwrfc for perl and also in node-sapnwrfc.  Is this option available or will it be available for sapnwrfc_exporter?  It would be brilliant if there was more information about monitoring SAP S/4 HANA application servers using prometheus and grafana.

      Thank you.

      Author's profile photo Ulrich Anhalt
      Ulrich Anhalt
      Blog Post Author

      Hi Ulfar,

      I had no use case for a saprouter string so far but I just added it to the connection parameters in the master branch. You can test if you like.

      Author's profile photo Úlfar Markús Ellenarson
      Úlfar Markús Ellenarson

      Hi Ulrich.

      Thank you for adding saprouter as a connection parameter.  I checked the git repo yesterday and saw that you had added it.

      I see that you have some reference TOML files in the repo and a couple in this article.  It would be lovely if there was a how to for creating TOML configurations for SAP.  I have not had the opportunity to try it out.   I need to figure out how to setup the docker image in sapnwrfc_exporter repo.  The information is a bit terse if one has not had much experience in docker.  Once I figure out how to test it from a virtual machine running in NAT mode so that the docker image is accessible to subnets accessible from the host machine, then I will be able to give more feedback.

      Would it be alright if I were to send a few questions as I plan on documenting the process..  Many thanks and much appreciated.

      Author's profile photo Ulrich Anhalt
      Ulrich Anhalt
      Blog Post Author

      Hi Ulfar,

      no problem - please email me your questions.

      Author's profile photo Juan Antonio Zavala Ortega
      Juan Antonio Zavala Ortega

      Hello

      Excuse me, which is the patch level in SAP that need to be have to install Prometheus Application

      thanks so much

      regards

      Author's profile photo Ulrich Anhalt
      Ulrich Anhalt

      Hello Juan,

      the sapnwrfc_exporter should work with all SAP releases from 4.6c and the hana_sql_exporter with all SAP HANA 2.0 databases.

      Best regards

      Ulli

      Author's profile photo Nicolas Prot
      Nicolas Prot

      Hello Ulrich,

      Some of your examples are not working. I don't know if python 3 or go are the responsible ...

      For instance TH_WPINFO. if I copy/paste the code, I have this issue :

      ERRO[0000] TableInfo: one or both entries missing        RowCount="map[]" Table=WPLIST
      Problems with config file:  getConfig(fillInternalMetrics): getConfig(checkTomlMetric): checkTomlMetric(sap_processes missing or wrong special info - field,structure or table)

       

      Could you please advise ? (I've already read your inputs about metrics, fields and so on, but I'm still confused with it)

      Thanks a lot
      BR

      Nicolas

      Author's profile photo Ulrich Anhalt
      Ulrich Anhalt

      Hi Nicolas,

      this seems to be a problem with the Toml configuration file regarding the section [metrics.tabledata]. Please take a look at the corresponding readme section of the repository. It also could be just a problem with the indentation.

      Best regards
      Ulli

      Author's profile photo Preet_ Sodhi
      Preet_ Sodhi

      Hi Nicolas,

       

      Were you able to resolve this issue? We are facing a similar problem with TH_WPINFO and TH_USER_LIST. If you were able to resolve these, would you please share the solution?

       

      Thanks,

      Preet

      Author's profile photo Nicolas Prot
      Nicolas Prot

      Nope. I've chosen another tool to monitor my servers.

      Author's profile photo Preet_ Sodhi
      Preet_ Sodhi

      Thanks Nicolas,

       

      If you don't mind, would you be able to share the name of the tool being used for monitoring?

       

      Regards,

      Preet

      Author's profile photo Nicolas Prot
      Nicolas Prot

      PRTG + scansors 🙂

      Author's profile photo Preet_ Sodhi
      Preet_ Sodhi

      Thank you Nicolas,

       

      Appreciate the details. We will take a look at this too.

       

      Cheers,

      Preet

      Author's profile photo Marcelo Comitre
      Marcelo Comitre

      Hello Ulrich,

      thank you for sharing this information.

      To avoid the error "ERROR max no of 200 conversations exceeded" in sapnwrfc_exporter follow procedure https://wiki.scn.sap.com/wiki/display/ABAPConn/ERROR++max+no+of+100+conversations+exceeded

       

      Thanks so much

      Sebastião