One-to-many Joins produce multiple events in a transaction block
After a recent discussion in the community forum, and the discovery that this behavior is not explained in the ESP documentation, I thought it would be worth writing up an explanation here so that others can benefit. We’ll get it added to the documentation, but in the mean time…
Joins in CCL may produce transaction blocks, even when there are no transaction blocks in the input
First, a brief recap on transaction blocks in ESP. ESP has the ability to group events in a transaction block such that the events are processed together, the results are coalesced, and if any event in the block can’t be processed the entire block won’t be processed. The explanation of this is a bit buried in the ESP doc (for now) I’m afraid, and it’s mostly discussed in the context of the SDK (e.g. here) or in the properties of some of the adapters that support transaction blocks (e.g. the file input adapter).
What’s less well known is that anywhere in the model that one event produces more than one derived event, the “children” of the parent event will be grouped in a transaction block. This occurs most often in a one-to-many Join. Take this simple example:
CREATE INPUT WINDOW InputWindowA
SCHEMA ( Seq integer , ID string , Value integer )
PRIMARY KEY ( Seq );
CREATE INPUT STREAM InputStream1
SCHEMA ( ID string , Column2 integer ) ;
CREATE OUTPUT STREAM Join1
InputStream1.ID ID ,
FROM InputStream1 INNER JOIN InputWindowA ON InputStream1.ID = InputWindowA.ID;
If an incoming event on InputStream1 (call it “Event X”) matches 3 rows in InputWindowA, then Join1 will emit 3 events in response to Event X. AND, these three events will be grouped in a single transaction block.
This will also be true for a Flex Operator.
Basically, in any situation where a single event produces multiple derived events, the “sibling” events will all be grouped in a transaction block.