Handling and preventing messages stuck in To Be Delivered status
Update 29 Oct 2014: Updated Resolution section – for ICO scenarios, processing is done on the worker threads on the sender side.
Since the early days of XI, getting messages stuck in the To Be Delivered status on the Adapter Engine has been a recurring issue. A search on SCN will list down hundreds of such discussion threads.
OSS note 813993, has an FAQ with the details of the different status in the Adapter Engine. It states that TBDL is a temporary state and cannot be changed manually. Therefore, it is not possible to Resend a message in this state from RWB.
To Be Delivered (German: “zu senden”, internal: TO_BE_DELIVERED, TBDL)
The message was transferred to the messaging system and was placed in the queue for the messages to be sent or received. The status is temporary and cannot be changed manually. Generally, this is not necessary, since the message is taken from the queue by a worker thread. For a large queue, you can start such a message manually in order to raise its priority.
Analysis and Troubleshooting
To troubleshoot further, first search for all messages in status “To Be Delivered” in the Adapter Engine. Change the time/date criteria to go as far back to identify when blockage first occurred. Then search for messages in status “Delivering” around the timeframe of when the block first occurred.
Once you find these, refresh the display a few times, and if the messages do not change status, then it is most likely that they are the messages causing the blockage.
You can then check the audit log for any further details of why the messages are still in “Delivering” state. If there are no details in the audit log, it could be because they are already flushed from the cache (refer OSS note 1314974.)
You can also further verify that the AE queues are blocked by the checking the engine status.
RWB -> Component Monitoring -> Adapter Engine -> Engine Status -> Additional Data
Over here, a large value in “Number of Entries in Queue” will indicate the blockage. You will also notice that the Assigned/Working threads are equal to the Max Threads. If you click into the queue, you would be able to see the messages in the queue.
When you have found the cause of the blockage, you need to resolve the issue before resending the blocked messages.
On newer versions or SP level of XI/PI, it is possible to release the threads held up by the “Delivering” messages by stopping the corresponding channel. This will require the correct version/SP level as described by OSS note 1604091.
For classical scenarios, normally the hanging threads are at the receiver channel. However, for ICO-based scenarios, the hanging threads will be at the sender channel as the messages are processed from the send queue (refer to Mike’s blog below for further details.)
As this issue often happens on the receiver end, one of the common cause of this issue is due to connectivity problem with the receiving backend system. This might be due to an FTP server not available or a JDBC server not having enough resources. After the backend issue has been resolved, firstly the blocking messages should be resend. After that, the “To Be Delivered” messages should automatically start to clear.
However, on older releases, the message status could not be updated correctly and this might require a restart of the Java stack as mentioned in many threads.
While a Java stack restart might resolve this issue after it has occurred, that does not prevent it from happening again.
The common reason for the blockage is because the worker threads have been fully used up, or not released properly. This occurs quite frequently on FTP and JDBC receiver adapters, but also possible on other adapters. The following OSS notes provide more details:-
849089 – sporadic connectivity to the FTP server can cause the channel to hang indefinitely
1473299 – setting of poolwaiting and concurrency causes adapter to wait indefinitely
1136790 – Blocking receiver channel may affect the whole adapter type
The following can be done to prevent the adapters from hanging.
1) Set timeout and connect mode in FTP adapters (for all FTP sender and receiver channels)
2) Set a non-zero pool waiting time in JDBC channel
3) Optionally set the queueParallelism properties (following note 1136790 for classical scenarios and note 1493502 for ICO scenarios) to prevent any single interface blocking the whole adapter
NWA -> Configuration Management -> Infrastructure -> Java System Properties -> Details -> Services
This timeout setting can also be extended to other adapters as necessary depending on the availability of such setting in the adapter.
Mike’s blog on tuning the PI message system describes in detail more about the worker threads, and also more details for further tuning.
OSS note 1623356 – “To be delivered” messages in Adapter Engine
good blog............keep blogging... 🙂
E. Ravi Chandra Reddy
Thanks for sharing the detailed information.
Harish, thanks for the kind feedback 🙂
Good blog for troubleshooting. 🙂
Thanks for the feedback, Vishal 🙂
Well explained for troubleshooting
We faced the same issue in JMS channel receiver side.We used Resend button (Adapter Engine messages) and a Refresh.It is because of size of the message.
Excellent blog ... thanks for sharing 🙂
Hello Eng Swee Yeoh,
Thanks for the informative blog. We are planning to set the property Max Receiver Parameter for ICO "queueParallelism.maxReceivers" in our landscape as we are facing some issues.We are having around 6 server nodes for production in our landscape. I have some queries if you have some idea :
1. The current setting is set to default 0. What value we should set to avoid hanging of queues? Any suggested value?
2. The value that we set will be applicable per server node or the entire system? For e.g. if we have a total of 30 worker threads(5 per node * 6 nodes) then if we set the parameter value to for e.g. 5 , then at run time 5 threads will be occupied per interface and 25 will be free for others or (5-5) zero threads will be free for others?
3. I want to set the parameter value of
messaging.system.queueParallelism.queueTypes = Recv, IcoAsync. The current value for this is default " ". So default will behave as Recv, IcoAsync or i need to add Recv, IcoAsync manually in the property?
For items 1 & 2, parallelism depends on the adapter type used. For example, having additional server nodes does not increase worker threads for a File sender channel, but does for File receiver channel. For an excellent explanation of this, check out the official documentation by SAP at the link below. In particular, section 6.1.1 on Adapter Parallelism will be relevant for you, and there is a table at page 53 listing it for all adapters.
SAP NetWeaver Process Integration Performance Check - Analyzing Performance Issues and Possible Solution Strategies
In general I would suggest setting the value to less than the number of worker threads of one server node, i.e. if each server node has 5 threads, set it between 1-4.
For item 3, the default value depends on what is the version of the PI system. For ICO scenarios, refer to SAP note 1493502. If you are on PI 7.30 onwards, then the default is as you have mentioned and you can leave it as it is. However, if you do have synchronous scenarios as well, you may want to consider if you want to restrict the max receivers for them as well by either putting "icoall" or "icosync".
Additionally, if you want to control the parallelism on a more granular level, check out section 6.2.4 of the above document, which refers to SAP note 1916598 that allows for control per interface.
Thanks for the explanation. Let me provide you some more detail. We specifically faced issue with IDoc_Senders due to heavy load and a lot of messages were stuck due to that. As per the links what i understood is :
1. The property "queueParallelism.maxReceivers" will not apply to Idoc _AAE sender adapters , but rather i can increase the thread count from InboundRA. In my case it is currently 5 so i can think to increasing it further so provide more threads to each Idoc_sender channel.
2. In our particular case nearly all the interfaces were using the same IDoc_sender communication channel(same name - Generic). So as per the blog this channel was allocated only 5 threads and only with that capacity it had to process a large number of messages. If we had different Idoc_Sender channels i.e. one each with different name for different interface then all of them would have been allocated 5 threads each and the processing would have been faster and without any issues.
Please correct if my understanding is wrong.
Can you please open a discussion thread on this issue instead? Please provide details of your full scenario(s) as well - i.e. what is/are the receiver channel(s), what error are you getting, how is it stuck and where (ECC side or receiver side or etc)
Restart of the server node where the delivering message is stuck(get the server number from message monitoring) would release the worker threads which in turn would enable the delivering message to be sent to endpoint.