CPI Runtime Node Instrumentation: Dynamic Loading ...

vadimklimov · ‎09-06-2019

Disclaimer

Material in this blog post is provided for information and technical features demonstration purposes only. Some steps of the described technique can introduce severe destabilization of the CPI tenant's runtime node or cause security risks in case of incautious and uncareful exploitation, or ignorance of side effects and drawbacks that are going to be highlighted in the blog post.

Some steps of the presented demonstration – such as a forced restart of a CPI runtime node – shall be avoided in production environments or environments under load, as abnormal termination of a runtime node operations can introduce inconsistency to message processing and cause temporary service unavailability.

Intro

Java Virtual Machine (JVM) comes with a powerful technology of instrumentation which is used to co-run tools that require access to bytecode executed by a JVM and that can perform manipulations with executed bytecode at runtime. Furthermore, JVM provides a corresponding mechanism – Attach API – that is used to load Java agents. This technique is widely utilized in both on premise and cloud landscapes, where instrumentation of Java platforms and applications with a variety of agents is employed to collect metrics about JVM runtime and running Java applications, enrich runtime and applications with additional capabilities and features, such as specialized tracing, methods invocation statistics collection, enhancement or even replacement of specific classes implementation, implementation of different hooks, etc. Java agents are provided by some vendors, as well as custom agents can be developed, given they comply to well-defined requirements – implement certain methods and are assembled in a compliant way.

I have already written an earlier blog post about application of Java agents to on premise JVMs and demonstrated how usage of agents can be used to flexibly extend capabilities of an original instrumented Java application, the subject of JVM instrumentation is also sufficiently described in details in various hands-on materials and documentation. Can we apply this technique to SAP CPI? Undoubtedly, availability of such technical capabilities would have pushed the boundaries of possibility in custom development and custom-built toolset for CPI runtime nodes monitoring and management. This is challenging, as when it comes to SAP CPI, customers are very limited in the area of tuning and customization of runtime, and they cannot tweak configuration of runtime nodes' JVM. But this is technically feasible, and in this blog post, I would like to illustrate how we can instrument a running JVM of a CPI runtime node by loading a Java agent and combine features provided by an agent with original capabilities of CPI.

Since customers cannot adjust startup arguments of a runtime node's JVM, we will not be able to start an agent during runtime node startup (static loading of an agent). The good news is, we can employ a technique of dynamic loading of an agent and apply it to an already running runtime node.

Whilst the described technique is universal and can be used to load literally any or almost any agent, I will use an agent Jolokia for the purpose of demonstration. Jolokia is a JMX-HTTP bridge – it exposes JMX operations over HTTP and is extensively used in environments where JMX operations related to JVM and platform monitoring and administration cannot be executed over TCP based Java RMI protocol, but have to be executed over alternative protocols, such as HTTP(S).

Following steps will be accomplished during demonstration and described in the blog post:

Download a Jolokia JVM agent's binary file and copy it to a file system of a CPI runtime node to make it locally accessible for a runtime node's JVM,

Attach to a JVM of a running runtime node and load the agent using Attach API,

Send a command to the agent and make it invoke a corresponding JMX operation over HTTP by an external HTTP client (Postman) to illustrate agent usage,

Restart a runtime node's JVM to "unload" the agent,

As a final step, perform cleanup activities and remove earlier downloaded agent from a runtime node’s file system to leave it in the original state.

For one time execution of required Groovy scripts by a runtime node, I will use a technique that has been described in the blog post – this will allow more rapid execution of scripts and instant reception of scripts’ execution results. I will place corresponding script's code snippet in an HTTP message body and will use Postman as an HTTP client to send requests to an iFlow in CPI and to trigger scripts' execution in the way described in the above mentioned blog post.

Download an agent and copy it to a runtime node's file system

To load an agent to a JVM, we firstly need to place its binary file to a location that is accessible to the JVM by its path reference. CPI runtime nodes run on an operating system SUSE Linux. SUSE Linux adheres to Filesystem Hierarchy Standard that defines directory structure in Linux distributions and defines a directory /tmp as a location for temporary files. I will use this directory for temporary storage of the agent's binary file and loading it to a JVM from this location. In real life scenarios, files such as agent binaries or configuration would normally be copied to a more permanent and dedicated location on a file system.

Before we proceed to any further steps, let's make sure that the agent's file can be written to the mentioned directory – this can be achieved with the help of a class File.

Code snippet of a corresponding CPI Groovy script that checks existence of a directory /tmp on a runtime node's file system and verifies if a file can be written to it:

import com.sap.gateway.ip.core.customdev.util.Message



Message processData(Message message) {



    String agentDirectoryPath = '/tmp'

    File agentDirectory = new File(agentDirectoryPath)



    String output = (agentDirectory.exists()) ? 'Directory exists' : 'Directory does not exist'



    if (agentDirectory.exists()) {

        output += (agentDirectory.canWrite()) ? '\nFile can be written to directory' : '\nFile cannot be written to directory'

    }



    message.body = output



    return message



}

At this point we received acknowledgement that the required directory exists, and a file can be written to it.

Next, it is necessary to locate and download a binary file of a Jolokia JVM agent. At the time of writing of this blog post, the latest version of a Jolokia JVM agent is 1.6.2 and it can be downloaded from the Maven Central Repository using the download link.

Code snippet of a CPI Groovy script that downloads a Jolokia JVM agent and copies it to a directory /tmp on a runtime node:

import com.sap.gateway.ip.core.customdev.util.Message



Message processData(Message message) {



    String agentUrlPath = 'https://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/1.6.2/jolokia-jvm-1.6.2-agent.jar'

    String agentFilePath = '/tmp/jolokia-jvm-1.6.2-agent.jar'

    URL agentUrl = new URL(agentUrlPath)

    File agentFile = new File(agentFilePath)

    agentFile << agentUrl.bytes



    message.body = null



    return message



}

The required agent's binary file has been placed to the accessible directory, and we are ready to move forward and load it to a runtime node's JVM.

Attach to a runtime node's JVM and load an agent

Generally speaking, it is possible to list running JVMs on a host by listing their descriptors (obtaining a collection of VirtualMachineDescriptor instances), and then attach to a specific JVM by its identifier using Attach API, but we can take advantage of the fact that we are going to load the agent from the script that is executed in a context of the same JVM, to which we are loading the agent – therefore, it is not required to list all running JVMs, but is sufficient to get to know an identifier of the JVM that will execute a Groovy script (and that will be a runtime node's JVM).

Code snippet of a CPI Groovy script that identifies process ID of a runtime node's JVM:

import com.sap.gateway.ip.core.customdev.util.Message

import java.lang.management.ManagementFactory



Message processData(Message message) {



    String jvmName = ManagementFactory.runtimeMXBean.name



    message.body = jvmName.take(jvmName.indexOf('@'))



    return message



}

Now we possess process ID of a runtime node's JVM, and the agent can be loaded to that JVM using Attach API and particularly executing a following sequence of calls:

VirtualMachine.attach() – to attach to a running JVM by its process ID and get a VirtualMachine instance that represents a JVM,

VirtualMachine.loadAgent() – to load a specified agent using its binary file location reference and optionally providing agent startup arguments, to a running JVM, to which we attached earlier,

VirtualMachine.detach() – to detach from a running JVM, to which we attached earlier. This step is required to ensure that an earlier established session to a running JVM has been gracefully closed, since it will not be required anymore.

Remarks:

To make demonstration more interesting and also to make usage of the agent in demonstration more secure, I will load the agent and pass agent startup arguments that will enable basic authentication to prevent anonymous invocation of agent's API by consumers.

For the sake of simplicity, I will not make any other sophisticated configuration of the agent. For example, I will not override a default port (8778), will not enable HTTPS and will not externalize agent configuration to a dedicated configuration file, but will provide agent configuration inline. The agent will be not exposed to external network interfaces, and will only start a listener on a virtual loopback network interface (localhost).

Code snippet of a CPI Groovy script that identifies process ID of a runtime node's JVM, attaches to it, loads the agent and detaches from the JVM:

import com.sap.gateway.ip.core.customdev.util.Message

import com.sun.tools.attach.VirtualMachine

import java.lang.management.ManagementFactory



Message processData(Message message) {



    String agentFilePath = '/tmp/jolokia-jvm-1.6.2-agent.jar'



    def agentParametersDemo = [

            user    : 'jolokia',

            password: 'JavaAgent@CPI_Demo4SAPCommunity'

    ]



    String agentParameters = agentParametersDemo.collect { key, value -> "${key}=${value}" }.join(',')

    String jvmName = ManagementFactory.runtimeMXBean.name

    String jvmPid = jvmName.take(jvmName.indexOf('@'))



    VirtualMachine jvm = VirtualMachine.attach(jvmPid)

    jvm.loadAgent(agentFilePath, agentParameters)

    jvm.detach()



    message.body = null



    return message



}

Access an agent and invoke its commands

Now when the agent has been loaded to a JVM and exposes certain functionality, we can demonstrate it in action. For illustrative purposes, I will invoke operation that is provided by one of standard MXBeans and that will trigger and collect a thread dump. This is an equivalent to invocation of a corresponding JMX operation: ManagementFactory.threadMXBean.dumpAllThreads(true, true)

As a baseline scenario, let's run a corresponding JMX operation from within a Groovy script. The loaded agent is not utilized at this point.

Code snippet of a CPI Groovy script to invoke a JMX operation to trigger a thread dump:

import com.sap.gateway.ip.core.customdev.util.Message

import java.lang.management.ManagementFactory



Message processData(Message message) {



    message.body = ManagementFactory.threadMXBean.dumpAllThreads(true, true).toString()



    return message



}

Next, let's make use of the agent and call its HTTP endpoint. To begin with and to start from a simpler implementation, a corresponding HTTP request will be constructed and submitted from within a Groovy script.

Code snippet of a CPI Groovy script to invoke a sample Jolokia API to trigger a thread dump:

import com.sap.gateway.ip.core.customdev.util.Message



Message processData(Message message) {



    String agentBaseUrlPath = 'http://localhost:8778/jolokia/'

    String user = 'jolokia'

    String password = 'JavaAgent@CPI_Demo4SAPCommunity'

    String command = '''

                        {

                            "type": "EXEC",

                            "mbean": "java.lang:type=Threading",

                            "operation": "dumpAllThreads",

                            "arguments": [

                                true,

                                true

                            ]

                        }

                     '''



    HttpURLConnection connection = new URL(agentBaseUrlPath).openConnection()

    connection.requestMethod = 'POST'

    connection.setRequestProperty('Authorization', 'Basic ' + Base64.encoder.encodeToString([user, password].join(':').bytes))

    connection.setRequestProperty('Content-Type', 'application/json')

    connection.doOutput = true

    connection.outputStream << new ByteArrayInputStream(command.bytes)



    message.body = connection.inputStream.text



    return message



}

That's the key part of this entire blog post. A moment ago we loaded an arbitrary Java agent (Jolokia JVM agent) to a running JVM of a CPI runtime node transparently, without disturbance to operations that are executed by the node, and now successfully consumed a feature that was provided and exposed by the agent (trigger thread dump), by CPI functionality (iFlow). The loaded agent is iFlow agnostic and co-runs with the JVM independently of iFlows and their state – to put it in other words, after being successfully loaded, the agent can collect metrics, perform specific activities and interfere with other components of the JVM, expose APIs and handle incoming requests that consume its APIs, on its own. Certainly, we could have used some other agent instead of Jolokia, or could have developed a custom-built one that would have implemented and exposed another required functionality and made it ready for further consumption by CPI or external components.

Now, let's wrap a demonstrated functionality into somewhat that is more flexible and comprehensible – for example, an iFlow that can be called by an HTTP client and that can proxy submitted requests and forward them to the agent. This will allow us to externalize a desired API to an external consumer, but to also protect the agent from direct exposure to external consumers and implement necessary additional security controls in the iFlow, should this be required. Below is an overview of such an iFlow and detalization of several steps that are of specific interest in context of the presented demonstration.

IFlow overview:

IFlow runtime configuration to allow an additional custom header that can hold the command that is to be passed to the agent, if that is required by Jolokia API for that command:

Sender connection configuration:

Receiver connection configuration:

Corresponding referenced credentials entry that holds credentials required to invoke the loaded agent API, has been deployed security material of the CPI tenant.

After the iFlow has been developed, deployed and started, we can send a corresponding well-formed request to it, and the iFlow will forward the request to the agent's API, invoke it and return results of the command execution back to a caller (here, Postman):

An evidence of message processing for this request can be obtained from CPI Message Monitor:

To verify content of the received response and to ensure that it matches thread dumps that were collected using earlier demonstrated methods, let me highlight a part of the response payload and provide listing of a single response value entry that is contained in it and that corresponds to information about a single thread that was captured in the collected thread dump. As it can be seen, we got thread details and its stack trace at the time of thread dump command execution:

We are done with the major part of the demonstration by now: not only the agent was loaded and its operations were unit-tested, but a proxy wrapper was created using a custom iFlow and a more advanced test turned out to be successful.

Restart a runtime node's JVM and "unload" an agent

The described technique has a substantial limitation: there is no generic and consistent way to unload an earlier loaded agent, unless corresponding functionality has been considered and implemented in the specific agent. Consequently, commonly it is required to restart a JVM as dynamically loaded agents will not "survive" JVM restart. JVM restart causes temporary service unavailability, unless appropriate redundancy and clusterization is in place and enables rolling restart of cluster nodes' JVMs.

Warning: restart of a running JVM causes termination of all its threads – and consequently, all tasks executed by them. As a result, message processing and any other operations on a runtime node which JVM is restarted, will be severely impacted, and a runtime node will be out of service temporarily throughout the duration of a runtime node shutdown and startup (including startup of all required components that were deployed and need to be started on the runtime node). An approach that I will use below, is not a soft restart (graceful termination), but a hard restart (immediate termination). Please proceed with caution, if and only if side effects of the demonstrated operation are clear and evident, and its consequences are acceptable.

There are different ways to trigger restart of a runtime node in CPI – for example, there are appropriate commands in Operations API, termination of a JVM can be invoked programmatically from a Java/Groovy code. I will use a latter option throughout this demonstration, as it is a quick way to achieve a required goal, though it is not the most appropriate from consistency perspective, as it doesn't take into account current message processing and any concurrently executed tasks, but initiates immediate termination of a JVM. The method is based on invocation of System.exit() to terminate a currently running JVM from within a Groovy script.

Code snippet of a CPI Groovy script that shuts down a runtime node’s JVM and causes its restart:

import com.sap.gateway.ip.core.customdev.util.Message



Message processData(Message message) {



    System.exit(0)



}

Note that we are not bothered about returning a message back from the Groovy script, as flow logic will just not reach a point where Groovy script completes its work and the message reaches subsequent processing step – a JVM will terminate and a runtime node will crash earlier than that. This can also be evidenced from a lack of response in Postman: a runtime node didn't return any kind of an HTTP response – neither successful response message, nor any error details or exception message – back to a caller.

Upon execution of such a script, a runtime node's JVM will be shut down and automatically restarted. After a runtime node completes startup procedure for all required components, and reaches a running state, a runtime node's JVM and loaded components shall become reverted to the same state as it was before this demonstration.

I cannot stress enough how dangerous the above operation is when being executed in a multi-threaded JVM that handles concurrent tasks, so please make sure consequences are clearly understood. By no means shall this command be applied in a production environment.

Remove an agent from a runtime node's file system

To bring the entire runtime node back to its original state, it is also a good idea to clean up a runtime node's file system – in particular, to remove an earlier downloaded binary file of a Jolokia JVM agent that has been placed in the directory /tmp. This can be achieved by invoking yet another method of a class File.

Code snippet of a CPI Groovy script that deletes the agent’s file from a runtime node's file system provided file absolute path:

import com.sap.gateway.ip.core.customdev.util.Message



Message processData(Message message) {



    String agentFilePath = '/tmp/jolokia-jvm-1.6.2-agent.jar'

    File agentFile = new File(agentFilePath)



    String output = (agentFile.exists()) ? 'Agent file exists' : 'Agent file does not exist'



    if (agentFile.exists()) {

        output += (agentFile.delete()) ? '\nFile has been deleted' : '\nFile has not been deleted'

    }



    message.body = output



    return message



}

A runtime node has been brought back to its original state both from perspective of a JVM and loaded components, and a file system, which concludes the demonstration.

As it could have been evidenced, the technique is generic and can be used to load Java agents that infuse very different capabilities and features to a runtime node, but described functionality, considering already mentioned drawbacks and consequences that can be introduced in case of its misuse, shall be used responsibly and carefully.