Background job stuck in Status “On Hold” with Reason “RFC”
When generating the delivery order in batches by set background job using transaction VL04’s driver program RV50SBT1, it usually takes minutes even in seconds to finish the job per plant with the frequency of one hour. But this time the longest job already pending on RFC response several hours with the status ‘On Hold’. And some other same jobs in parallel with a different variant are stuck as well.
Haven’t found the root as it only happens on the first day of the new quarter and can’t reproduced at other systems. And if we stop the first job which is stuck then rest of job will be completed very soon. Here wrap the possible reasons and its solutions for analysis in the future.
Looks like the first RFC stuck and keeps other parallel jobs waiting but not all jobs are waiting. They’re calling the same program using different variants for different plants, many following jobs still can finish in seconds.
Summary of possible reasons
One answer from Shyam concludes all the possible reasons that lead to ‘On Hold’ RFC status:
1. In general ON Hold status of the work process is that during processing if RFC calls or waits for another RFC or some process is supposed to get complete, those cases staus will be in on hold. This is only an example, there can be many reasons for the on-hold status of the Wp. More details can be obtained in Wp trace files.
2.Sometimes it leads to performance problems, especially if there are more wp’s are in on-hold status results in the bottlenecks of the availability of wp’s, as a result, transactions that were running or if new users got logged in are affected.
3.Check the trace files of the work process in SM50 and also from where these RFCs got initiated. If possible check dev_rfc<wp no> files in the working directory for more details.
And this linkage explains very well if it’s relevant to the gateway memory bottleneck.
Notes 1900128 with the title ‘Work process in “on Hold” RFC status in SM50/SM66’ seems to match this case perfectly.
- You have an external RFC client program that connects to SAP.
- When there are multiple calls in parallel by the RFC client program, you may face slow response or even timeout.
- At the same time, you found the work process is in “On Hold” RFC status, and in the gateway trace,
Don’t know why the system matters, it’s specific to Windows OS only not include Unix/Linux.
2180934 ‘Analysis of Work process in “On Hold” RFC, or Stopped CPIC status’.
DIA or BGD work process remains in Status “On Hold” with Reason “RFC” in transaction SM50, or Status “Stopped” with Reason “CPIC” in transaction SM66.
The RFC-client process is waiting for the RFC-server during the function call. The RFC-client process cannot be rolled out. The possible reasons for this status are described in note 934109.
As of release 740
– Run the transaction SM51 in the system where the RFC server process is running. Select the RFC-server machine and click on Goto – Information – Communication – RFC connection
– Select the Connection Table entry where the ConvID is the Conversation ID from the SM50 from the RFC-client machine and the value of the Type field is SERVER
– Click on the Session Key to get to the transaction SM04 and open the User – Technical Information from the menu.
The name of the server function is in the variable “F=” of the “modeinfo.appl_info(stack)” fields.
This is the server function that has to be analyzed. If the server function cannot be found in SM50 by the work process number from “modeinfo.last_wp” check the value in the “modeinfo.abap_state” field. Verify if the server process is in DP_ROLLED_OUT status. In this case check the “modeinfo.rollout_reason”. e.g. the status SLEEP in this field means that the server function runs a WAIT statement.
This should be the correct procedure to analyze the stuck RFC. The RFC server process has to be running when analyzed with the above approach. But unfortunately, there no chance to verify this specific pending RFC as it had been stopped manually cause of an emergency. Definitely will try this next time.
Hi Mr Zhang,
We are having similar issues, would you share your findings? What is the root cause? Our case is on printing - one user printing occupied a spool work process, the status is on-hold and it leads other user's printing waiting and unable to print. After I killed the on-hold process, all other printing are resumed. Thanks. We are the same as yours - waiting for next time to debug the session, but I wonder I can find anything. Thank you !!
This is a very rare case for us and didn't encounter 2nd time till now. No clue from ABAP's point of view, have to check from Basis's perspective. Just ask basis help to try Notes mentioned approach to verify.