To design a scenario that is working fine is one thing; but to design a scenario in such a way that it also fulfills non-functional requirements like performance and robustness is quite another thing.
In this blog, I want to show you how we could improve a datastore scenario that was already running fine.
Let me start by explaining the scenario:
The integration flow forms a REST service that provides three functionalities: senders store JSON messages in a datastore via a POST call, a potential receiver of those messages calls the REST service to pick them up from the datastore via a GET call. Via a DELETE call, this receiver can also delete messages from the datastore once it has picked them up successfully.
Let’s check the single operations in more detail.
This operation is identified by the HTTP method POST.
As the Datastore Select operation only supports XML messages, the incoming JSON message is first converted into XML before it is then stored in the datastore. The sender receives a http 200 response code.
Picking up messages
This operation is identified by the HTTP method GET.
The receiver of the messages would like to pick up all of them, but as there might be a lot of messages in the datastore (and returning all of them at once would overload the http response), a paging mechanism was implemented. A first datastore SELECT will fetch up to 10.000 messages; then we count the messages (as less messages could have been selected in case less than 10.000 messages existed) and a second SELECT call will fetch only 1000 messages (if at least 1000 messages exist, otherwise less). Afterwards, we want to calculate the difference between the first and the second SELECT result in order to determine the number of messages that might still remain to be picked up by the receiver. This way, the receiver can check if another call is required or not. The response is sent with a http 200 response code.
Important: For critical business data we strongly recommend to separate the Read of datastore entries from the Delete of datastore entries. We want to avoid that the response message gets affected by some network error and your messages are lost. Therefore, our REST service offers another operation for explicit deletion.
This operation is identified by the HTTP method DELETE.
For each entry picked up from the call mentioned above, the receiver sends a separate call with one ID. A datastore GET call first checks if an entry with this ID is already available in the datastore and if yes, the entry is afterwards deleted via the datastore DELETE step. A response code 204 is returned to the sender. If the entry doesn’t exist in the datastore, a response code 404 is returned.
This design was working fine and the integration flow was running in a productive environment. But soon, the flow developer realized that during the work week, when the senders were pushing messages to the REST service, the datastore volume was growing bigger and bigger and the receiver couldn’t keep up with the pace in picking up and clearing the messages from the datastore. Only during weekends, when no senders were pushing new messages, the receiver could empty the datastore. The developer was faced with a reduction of the performance towards the end of the work week for the GET calls to pick up messages from the datastore.
Together with the developer, I checked the scenario and found two potential improvement areas.
Paging mechanism inefficient
To read 10.000 entries just to get the exact number of messages left in the datastore is a performance killer. What’s worse, the number returned is not even accurate. If there are more than 10.000 entries in the datastore, the real number is not included in the calculation.
Deletion of messages inefficient
To have a separate HTTP call for a single deletion is causing lots of network overhead and has to be improved.
In the new design, we only selected at max the desired number of datastore entries, e.g. 1000. Afterwards, we counted how many messages we received for real (you remember, the datastore SELECT returns all available messages up to the specified amount). In case the SELECT returned the full amount (i.e. 1000 entries), then that is a 99% indicator that the datastore contains more messages. In this case, our response to the receiver will contain (in addition to the requested 1000 entries) an information that there are more entries to be picked up. This can be in any form, a header, an additional XML tag, an XML attribute… anything the receiver can evaluate.
Deletion of messages
We got rid of the differentiation between the http 204 and the http 404 response. If an entry should be deleted, it doesn’t matter whether it’s available or not (and the datastore DELETE step doesn’t behave different neither). This way, we could skip the datastore GET call and instead do the DELETE directly.
As the datastore DELETE step does not only support the deletion of a single entry but of many entries, we asked the receiver to send only one call with an xml message containing all IDs instead of sending separate messages with single entry IDs. In the datastore DELETE step, we used an xpath expression pointing to the IDs in the message so that all items are removed in one shot.
Switching off JDBC transaction handling
This change is not necessarily required for optimizing the scenario but it’s in general a useful recommendation.
The JDBC transaction handling will make sure that all modifying DB operations share the same Commit or Rollback. If an entry is written and the flow fails before being finished, then nothing gets committed. By this data consistency is ensured.
To activate the JDBC transaction also means to use a DB connection for the whole processing time of the flow. As DB connections are a limited resource, I recommend to carefully check if the JDBC transaction is really required when DB operations are performed, and if not, to switch it off. You can find information on how to do so in Mandy’s blog about transaction handling.
In this flow, no transaction handling is required, as:
- the GET operation is not modifying anything, so the data consistency is not endangered.
- the POST operation is performing the datastore WRITE as the last step of the flow, so there can’t occur any errors after the data has been written. The data consistency is not endangered.
- the datastore DELETE is again the last step of the flow for the DELETE operation. The data consistency is not endangered.
The feedback from the integration developer is that now after our re-design even during the work week the datastore volume is not growing anymore and entries are being rapidly picked up and removed.