In today’s “SAP Answer-Thon 2021” post, I will dip my toe into a performance tuning question. More of a debug than simple tuning, I found this topic:
First, while the question title is not bad, it could be improved. Knowing that the “system” could truncate the topic, think about the keywords that should be included so that anyone reading or searching will recognize the subject. Does “2 or 3 times” mean that this error does not happen every time (data dependent), or, worse, does it mean that the same data has been recorded multiple times due to unknown causes?
“Broken pipe” or “connection reset by peer” are key error messages mentioned that were returned after process failure or failures. I would include something about these specific symptoms over the general concept of failed integration jobs in the title. “XYZ Flows Fail With Broken Pipe Or Connection Reset” maybe.
Not being conversant with this specific application, I had difficulty matching the prose with the supplied diagram. It is unclear to me where in the workflow the errors are being triggered, though it seems to be in the “http/receiver” area. This is more my lack of understanding of the background more than the poster not sharing enough details. However, I would suggest adding notes on the diagram, if possible, to point out which data source or sink is doing what.
HTTP REQUEST <packet details>
Seeing “multiple RFC” processes mentioned in the context of “we only started one” suggests that parallel child processes are starting, as a result of some (hidden) application configuration defaults.
multiple rfc calls are running in the background although only one rfc call have in our java api
The flow, simplified greatly, is shown as: SCPI→Java→S4H
This is unclear to me, as SCPI and S4H seem to be “platforms” or “systems” (cloud-based or not, doesn’t matter to the question), whereas Java is a language. The error messages listed in the question do seem to be Java-based error messages given the common “java.io.IOException“ string. Good luck doing a stackoverflow search on that, or good luck finding the correct root cause of “you threw too hard and I dropped the ball” sentiment.
What suggestions would I have to address the question? I’m posting here rather than as a comment since comment editing leaves a bit to be desired, and I don’t have confidence in solving this.
One, I’d dig into the documentation to figure out if there is a way to set a single stream of packets instead of the multiple RFC threads that seem to have triggered. That would be a way to “slow down” the throw so the catcher is not overwhelmed.
Second, I’d look at the S4H system to see where any bottlenecks might occur. While I would expect that the data integration flow has some type of handshake to reflect readiness to capture more incoming data, there could be undersized buffer caches, poorly architected data stores, or a variety of background process overhead that adds to the delay (permission checks, logic verification, etc.)
Third, if this flow can be simulated in a test system, shrink the data volumes then increase until the problem is repeated.
Oh, and read the log tea leaves.
I’ve looked at other posts by the person who posted this question, and they are good/difficult questions. I found this might be relevant, though:
This question is dated, but good for context (mentions “wireshark” and “trace” so definitely a deep dive):