Extract System Logs from your Cloud Integration in...

jeremy_ma_usa · ‎09-20-2022

Why/What - To accelerate the learning cycle through Build, Observation and Learn - you need robust and reliable observation capabilities.

Below technique describe how to extract your System logs from your Cloud Integration (CI) Tenants into Splunk and how you can turn mass data as first step of observations

Retention; CI system log data retention time in the database is 7 days (Neo); 30 days (CF) w. storage limit*; If you need to look back history/triage issue pass xx days. 2582913 - [SAP Cloud Integration] How long are logs persisted.

HTTP – all inbound request regardless the iFlow/endpoints availability; Imagine you want to see why sender issue 404; 401

Trace – ALL events including polling adapter (timer/sFTP/Kafka) actives which will NOT be captured by MPL – ie. missing credentials; If iFlow were to failed to deploy due to missing dependencies are also captured; If a msg failed you will get the error along Elapsed (ms) of the steps (below).

Power of search tools (Splunk/Solarwind) which will showcase at end to help find anomalies that can be shared btw teams - Learning and self discovery

The building blocks are accessing the LogFiles API and push to Splunk, which will go into detail in below section.

Important to note is SystemLogs are just one extra dataset, and from practices we have learned MPL is structured contain richer set of dimensions of how iFlow actually performs and if thought thoroughly w. incorporation of sender/receiver context you can delivery end to end telemetry.

Just two words on above, Santos Kumarv have nice blog on how to built iFlow to extract MPL and send to Splunk and I have written one for Solarwind. SAP have roadmap to have native connector to Splunk/others in upcoming quarter to push MPL. Please look forward for that as its much robust/higher throughput.

According to my sources extracting system logs to Splunk is not in current roadmap, hence I want to share this here.

Let's quickly review how you can access system logs (CF) today through WebUI below:

Under the LogFile attributes:

LogFiles API Response

A. LogFileType - HTTP vs. TRACE

B. Size - Number of entries w. max 150K, before roll to next file

C. Application - Tenant short name;

LastModified - field we will later used to track what file we are downloading and upload.

Name - File http/trace timestamp of first event entry

To access above oData API, you will need services instance created and necessary permission like what is described in MPL documentation. This will required BTP cockpit access by an administrator.

Now you have some basic understanding of where the logs and what the payloads looks like, we will go into how to built this iFlows to extra > forward logs into Splunk.

How - Deploy iFlow as describe below and working with your Splunk in-house experts to incorporate the HTTP/Trace unique formatting as logs are in compress zip and ideally Splunk can index these files for faster search and accelerate interpretation.

The design of the extraction is decouple one iFlow first tracking the list of LogFiles and second iFlow to actually download the LogFile (zip) and push to target (Splunk). Two reasons behind splitting the iFlow is because as of current writing Sept 7, 22, the API to access LogFiles download can take upward of 80+ seconds in (CF) and this may cause timeout.

SAP is aware and performance improvements coming releases to drastically reduced the access time but incase your tenant generate large number of logs the second reason to decouple is to allow concurrent and robust download and pushing of these files to Splunk. As the file contains up-to 150K lines and ~3-5MB zipped and 40mb+ uncompressed.

To facilitate decoupling I have chosen to use JMS Async so you can monitor on top of the system files. If you have other alternatives please share your experience.

LogFileTracker

Timer: Set for recurring; ideally run once per day just after 00:00 GMT so all files from previous day is extracted; Too frequent will be taxing as files and chance of duplicates of logs;
Config A: ServerURL of the tenant and Timestamps to track previous runs; 10 mins - optional not to pick active logs
Config B: Credentials of the oData API as described in earlier (MPL)
C Section of filter is used to remove the most current logFile so partial log is ignored	Filter LogFileType: /LogFiles/LogFile[ LogFileType = 'TRACE' ] Filter out current file: /LogFiles/LogFile[ ( (preceding-sibling::LogFile \| following-sibling::LogFile)/LastModified > LastModified ) ]

Step 2 Forward individual LogFile to Splunk LogFile_Forwarder

Config A: Define Splunk HTTP forwarding host and extending timeout
Splunk Headers for Token and additional attributes; The Splunk Source(s) will be determined dynamically if keeping cpi_sys_http and cpi_sys_trace where name comes from part of LogFile XML node Application.

JMS Adapter - as default defined 1 concurrent extraction to API, depend number of workernode in your assigned tenant, this will likely run in multiple threads.

Prep Steps for Splunk

Below are props.conf files that is needed each for HTTP and TRACE as they format slightly different where trace/stacks should represent in single block of event while HTTP events are represented in each line. Splunk expert Jeff Edquid contribution below.

[cpi_sys_http]

LINE_BREAKER = ([\r\n]+)

MAX_TIMESTAMP_LOOKAHEAD = 31

TIME_FORMAT = %d/%m/%Y:%H:%M:%S %z 

TIME_PREFIX = - - \[

SHOULD_LINEMERGE = false



[cpi_sys_trace]

LINE_BREAKER = ([\r\n]+)

MAX_TIMESTAMP_LOOKAHEAD = 25

TIME_FORMAT = %Y-%m-%d %H:%M:%S#%z

TIME_PREFIX = ^

SHOULD_LINEMERGE = true

Trace file Showing in Splunk and Expanded below

Below example of TRACE file that you can search for where a msg failed and elapsed time of each steps. The WebUI debug mode is more user friendly.

Information > Insights

Now you have logs pull into Splunk. Using simple wild card search in Splunk to find exceptions and patterns from which URL Context path:

index=xyz source=cpi_sys_http "1.1\" 40*"

Finding HTTP Exceptions

Finding msg runtime errors by looking for camel exception as example..

index=xyz source=cpi_sys_trace "#ERROR#org.apache.camel.processor.DefaultErrorHandler#" NOT "Kafka"

You can create Splunk Alerts to eliminate errors like above and improve MTTR.

Credits

to below individual for their asssistance!

Felix K. from SAP Product Engineering for LogFiles API guidance and performance analysis

Jeff Edquid Splunk expert tips on handling of the TRACE files/validation

sriprasadshivaramabhat on XSL/Groovy tips and feedbacks

To download the iFlows.

PS. I will update blog when the improved performance of API is releases. - Thanks for reading and feedback in your experiences. Cheers