Some errors in an XI production environment

Former Member · ‎11-24-2006

For the last four months, I have been monitoring a pretty huge XI production environment for my client. Some facts to be taken into consideration are that these interfaces had been built on SP09, and the requirements were pretty complex. The scenarios involved almost all of XI's functionalities from IDOCs, RFCs, Proxies, JDBCs, complex BPMs and so on.

The major activities involved in XI support can be classified into :

1.Monitoring and Operations Support

2.Bug fixing, patching and enhancements

3.Disaster Recovery

Now, we would basically be looking at some of the 'errors' faced in a production environment, and basic troubleshooting techniques for them.Note that these errors are those which were specific to our environment, and may not apply to others 🙂

A CUSTOM classification of our production errors would be something like this :

i>XI Production Errors - A custom classification

Lets take a look at some of the errors faced in our environment

Integration Engine Errors :

1. JCO_COMMUNICATION_FAILURE

This error is due to a problem in the RFC connection between your integration engine and your J2EE mapping runtime.

JCO_Communication_Failure

Troubleshooting:

*) Go to transaction SM59 --> TCP/IP Connections --> AI_RUNTIME_JCOSERVER and test the connection.

Testing TCP/IP connectivity in SM59

*)If there are errors, try deleting and recreating the same.(though this may not be possible in production)

*) Alternatively, you can also do a restart of the J2EE engine.( works for us everytime:) )

2.RFC APPLICATION ERROR

This is another error which usually occurs in RFC synchronous scenarios. Two basic reasons for this error could be:

(a)Data format / Type mismatch (between the payload and the RFC structure)

(b)An uncaught exception in the RFC call( shown in the figure below)

RFC Application error

Troubleshooting:

*)Check the response message structure from the RFC and see if the structure has a datatype mismatch.

*)Open the file and check manually for special characters.

*)Check your dump in ST22 on the R/3 system.

*)Also check whether your RFC is activated in the app.system

*)Finally, reprocess the file

HTTP RESPONSE CODE STATUS 503:

A HTTP response code 503 is usually due to the fact that the J2EE server is high on load and hence is unable to accept the call, or refuses the connection.

HTTP Resp code status 503

Troubleshooting:

*) Take a look at SAP Note 803145 -- Make sure that the J2EE server is accessible. Check the access to the Receive servlet as described in the note.

*)In the visual admin, navigate to the "SAP Xi AF Messaging" service and increase the value of the "pollAttempts" parameter to 100 in the "messaging.connections" property.

*)Restart the SAP XI AF messaging service and send the message again. Check that all the adapter services are started, and if required, increase the number of application threads.

*)For a detailed description of other status codes, please refer http://www.w3.org/Protocols/HTTP/HTRESP.html

Xi Queues (qRFC):

The major issues with queues are :

(a)Queues getting stuck up

(b)Queues set to SYSFAIL status thereby holding up messages behind them.

Stuck queues in XI

Now, queues proved to be a big problem for us. This was because, one of our requirements was to use unix scripts to pick files from a dropzone and bring them into XI, and vice versa.With later SPs, it became pretty easy due to OS command parameters feature but since we did not have that feature back then, we had to resort to using a dummy mapping within XI, which triggered the script to run once every five minutes to pick all files from the dropzone and vice versa.The problem was, these dummy files kept getting stuck in inbound queues, and held up important workflow messages behind them which needed to be cleared manually from time to time.

Another issue we faced with queues was that due to certain other problems with our SUS server, the queues kept going into SYSFAIL status, thereby leaving us with no option but to delete the SYSFAIL entry in the queue, and process the messages behind them.

Queue in SYSFAIL status

Troubleshooting:

*) You can use transaction codes SMQR and SXMB_ADM for queue status resetting.

*) use transaction codes SMQ1 and SMQ2 for deleting SYSFAIL files, and manually resending messages in the queues.(For activating queues in SMQ1 and SMQ2, you would need to first deregister them in SMQR or SXMB_ADM).

*)As a pre-requisite, the IS configuration parameter MONITOR QRFC_RESTART_ALLOWED should be set to 1.

WORKFLOW ERRORS:

Workflow error

There are a lot of reasons why you could have a workflow error.It might be due to system errors which occur, due to mapping exceptions and others.

The major decision which we had to make on numerous situations was whether to restart the workflow from where the error occured, or reprocess the entire message one more time.The latter was not always possible, as it would mean duplicated data being sent to the target system, and this cannot be done on the production box.

In our scenarios, we had designed a reconciliation logic to raise an alert and also start from the failure point, which was described in a blog by Krishnamoorthy

The issues discussed here, were just some of them that occured on the integration engine of our prod environment.

References:

How to monitor Exchange Infrastructure 3.0

XI : How to Re-Process failed XI Messages Automatically

Reconciliation of Messages in BPM

Reconciliation of Messages in BPM Contd. - Restart Workflow

SAP Problem Analysis guide for XI