‘Event-Driven Architecture’ Manifesto for Distributed Enterprise Applications
In his February 2017 Blog, What do you mean by “Event-Driven”?, Martin Fowler – one of the founding fathers of the Agile Software Manifesto – outlines the broad conclusions of a late-2016 Summit – including senior developers from all over the World – held “to discuss the nature of ‘Event-Driven’ applications”. “The biggest outcome of the summit was recognizing that when people talk about ‘Events’, they actually mean some quite different things”. So let’s try to clear things up.
The first of the two eventing patterns that Martin details in his Blog is ‘Event Notification’. In this Pattern, “An event need not carry much data on it, often just some id information and a link back to the sender that can be queried for more information” if desired. “A key element of event notification is that the source system doesn’t really care much about the response. Often it doesn’t expect any answer at all, or if there is a response that the source does care about, it’s indirect”. In other words, a fundamental principle of the ‘Event Notification’ Pattern – and ‘Event-Driven Architecture’ more generally – is that communication between the various software components composing a ‘Distributed Application’ is always Asynchronous: “if there is a response … it’s indirect”. Whilst this was never a requirement of ‘Service-Oriented Architecture’ (SOA) – where components are ‘loosely coupled’ – it becomes our first principle of ‘Event-Driven Architecture’ – where components must be entirely ‘decoupled’.
Martin immediately goes on to warn us about the use of events: “The danger is that it’s very easy to make nicely decoupled systems … but you have to be careful of the trap… A simple example of this trap is when an event is used as a passive-aggressive command. This happens when the source system expects the recipient to carry out an action, and ought to use a command message to show that intention, but styles the message as an event instead”. In fact, it’s easy to spot the confusion here in Martin’s words. On the inside of an individual Software Component, its internal operations will typically always be based upon commands. However, all communication between that Software Component and the other software components that make up any given Distributed Application must, by definition – in the context of ‘Event-Driven Architecture’ (EDA) – be based upon Events. If communication between software components is direct – via Commands – then we are clearly no longer within the realm of EDA, and can no longer imagine building the soft of “nicely decoupled systems” that Martin praised only a few sentences earlier.
Even the phrase “when the source system expects the recipient to carry out an action” seems to completely miss the point that Martin himself made only a few lines earlier: “if there is a response that the source [system] does care about, it’s indirect”. Not only does the source system have no idea from which system any response might come, it has no idea whether there will even be a response: there is no ‘Target’ system! Given this, it is impossible to permit – within the context of EDA – that a source system “ought to use a command message”. To whom, is the question, and using which version of their API? And what if there are multiple (possible) targets – common in EDA – each of which is unknown and unknowable, and each of which uses a different API?
When reflecting upon the real world, it is likewise difficult to accept as axiomatic that “when the source system expects the recipient to carry out an action … [it] ought to use a command message to show that intention”. If someone telephones a Sales Representative to request a Quote, and that ‘SalesRep’ Entity needs to get back to them the next day, in real world terms, a bona fide ‘Event’ just took place – ‘Quote.Requested‘ – and it happened within a precise business domain. Should such a scenario really be modelled as a ‘Command’ (e.g. ‘Quote.Create’), given there is no synchronous response allowed, and given that the source system has no idea whatsoever if or when there ever will be a response – as per the real world. And what if several different Sales Representatives were contacted at the same time in order to have the best price – as per the real world – and as a consequence, several different responses can be expected (depending on ‘SalesRep’ availability)? Can this even be modelled as a ‘Command’, as we are vigorously instructed to do?
The second eventing pattern discussed in his Blog, Martin calls: ‘Event-Carried State Transfer’. Unlike the ‘Event Notification’ Pattern, the ‘Event-Carried State Transfer’ Pattern demands that each Event’s payload includes details of all State-changes made to the corresponding Entity, as a result of the Event. This Pattern “shows up when you want to update clients of a system in such a way that they don’t need to contact the source system in order to do further work. A customer management system might fire off events whenever a customer changes their details (such as an address) with events that contain details of the data that changed. A recipient can then update it’s own copy of customer data with the changes, so that it never needs to talk to the main customer system”.
“An obvious down-side of this pattern is that there’s lots of data schlepped around and lots of copies”, but we are nonetheless reassured: “What we gain is greater resilience, since the recipient systems can function if the customer system is/becomes unavailable”. I found this last statement quite surprising given that Martin himself informed us in another (much earlier) Blog, that “this is a two-edged sword. A component may continue operating, but it will be working on out of date information if it’s not receiving events as things change. It may therefore initiate actions based on out of date information. With request collaboration, it would just not work – which in some scenarios may be preferable”. It would appear however that Martin has since been able to perceive the problem: “Many people find the event processing adds a lot of complexity to an application (although I do wonder if that’s more due to poor separation between components that derive a working copy and components that do the domain processing)”. Bingo!
This brings us to what must become the second fundamental principle of Event-Driven Architecture for Distributed Enterprise Applications: each ‘Entity Type’ must have a single ‘Owner-Component’. It doesn’t matter how many other components shadow that Entity Type, but there should only ever be one component that owns it; only one that can update an entity type’s members, and that can ‘Publish’ the corresponding events. All other software components making up the Distributed Application should have no right to do anything other than ‘Subscribe’ to events concerning that same Entity Type. Such a design will certainly be referred to by purists as ‘Eventual Consistency’ – as subscribers will only ‘eventually’ come to be consistent with updated data held on the Owner-Component – but as the delay for such updates to reach subscriber components should be measured in the milliseconds, this latency will in reality be very similar to what is witnessed in synchronous operations. Needless to say the target records will never be locked on the shadow database – where no updates are performed directly on that Entity Type – and as such will be updated without delay or complication.
And what if the owner system is missing some fields in its data model, with regards to a particular Entity Type (e.g. Customer)? The answer seems quite apparent: (1) you add those fields to the owner component’s model (and have the same user group maintain the same entities for which they are primarily responsible), or (2) you build a spaghetti system that no one can understand, support, or even diagram, where some of an Entity’s fields are the responsibility of Component A, others the responsibility of Component B, and others still the responsibility of Component C – then you build in some concurrency protection to ensure that if, for example, an Entity was changed in Component A whilst simultaneously changed in Component B, if the changed field is common to both (e.g. address), then you roll-back one of the two components depending on who arrived first, informing the user with a notification, but if the roll-back fails… I think I can stop there. It is also worth pointing out that only option (1) is fully asynchronous, requiring no use whatsoever of commands/requests – it is the only option available in a true EDA. Likewise, if the Owner-Component goes down in this scenario, the other components can never possibly “be working on out of date information”: if the owner is down, no updates can ever be made to the entities that it owns, and any updates that were made, triggered events that will have already resulted in updates to the entity stores of subscribed components.
Such a Single-Entity-Owner rule, whilst clearly related, should not be confused with the ‘Bounded Contexts’ first described in Eric Evans’ 2003 Book, Domain-Driven Design. Evans suggested the use of ‘Bounded Contexts’ to – conceptually – group together all entity types that form part of a given business domain (e.g. the ‘Customer’, ‘Product’ and ‘Sales Order’ entity types that are each part of the ‘Sales’ business domain). However, he lays no interdiction to having the same business Entity Type, such as ‘Product’, managed partially by distinct systems. Whilst he does propose a mapping between different forms of the same entity type in cases where it is shared across different domains, there is nothing declaring that each entity type should be owned – in its entirety – by a single component (database/persistence layer).
Robust distributed enterprise applications will in fact often demand the opposite of what Evans suggests: they will demand that all attributes/fields of an entity type that is shared across different business domains, are all added to the ‘domain model’ of that software component that is nominated as the entity type’s ‘owner’. Given that there should be a strong link between any particular ‘Business Domain’ and its ‘Software Component’ host – ideally 1:1 for enterprise applications – and given that the same team will likely be charged with supporting both, not having a Single-Entity-Owner rule results in system complexities that produce significantly more hardship than the inconvenience of declaring that the domain model of a particular owner-component shall serve as the master for the entire Distributed Application, for a particular Entity Type.
Far more recently, Evans declared that ‘generic off-the-shelf business contexts’ can make domain boundaries far easier to draw, given that they are proprietary and therefore imposed to a large extent. In the case of distributed enterprise applications built upon (at least some) proprietary software components, this overlooks a fairly obvious issue: there will often be several proprietary software components touching upon exactly the same business domain (e.g. Salesforce and SAP), in which case only one of them should be chosen as the owner of each relevant entity type. In modern proprietary solutions, it should be relatively straightforward to add new fields even to their ‘proprietary’ models; a need that should influence the decision on the principal owner-component in each business domain.
The Single-Entity-Owner principle, along with the first EDA principle of ‘Always Asynchronous’, gives rise to a new concept in software development, and our third EDA principle: ‘Macroservices’. Microservices don’t care how many distinct components/systems are involved in any given operation, meaning a very certain dependence on synchronous communication: if one Microservice call fails, certain others in the chain need to know about it. There will typically be a lead Microservice that is called synchronously, which depends upon the results of all the other microservices that it itself called (synchronously or asynchronously). Such a Microservice scenario will often depend upon synchronous communication between distinct software components: outlawed in EDA. Macroservices, conversely, should only ever execute on a single software component, and should always be built at ‘Entity Type-level’ – hence ‘Macro’ – in order to be coherent with the overall architecture of the Distributed Application; in order that each Entity Type’s State-changing events can be published by its Owner-Component.
Entity Query Store
Returning to our first EDA principle of ‘Always Asynchronous’ communication between software components, the only possible exception that might be permitted concerns ‘Lightweight Components’. Such lightweight software components may not have their own database, or an adequate means of persistence. In other words, they can never be fully autonomous, as should typically be expected of any ‘worthwhile’ software components in a Distributed Enterprise Application. The first question that should always be asked in such cases is whether these lightweight components ought to be merged with other more heavyweight components, in order to obtain the needed persistence.
If this is not feasible, one possible exception to always-asynchronous communication can be imagined: an ‘Entity Query Store’ can be built at the hub of the Event-Driven Architecture; conceptually alongside the Event Broker (several of which happen to provide precisely such persistence out-of-the-box). This ‘Entity Query Store’ (EQS) would subscribe to all the events of all the entity types that it hosts, and would maintain an always-up-to-date image of each and every entity type hosted. Unlike the ‘Event Sourcing’ pattern, received events would be immediately ‘folded’ with the current image to build the latest ‘Folded State’ of each entity: ‘Data’. This EQS could be queried synchronously by any software component that wished (as per the ‘CQRS’ Pattern).
The reason I referred to such an EQS as only a “possible exception” to ‘Always Asynchronous’ communication, is because an EQS is not in fact a ‘software component’: it contains no (distributed) application logic, is completely technology-agnostic, and provides nothing at all beyond a generic – read-only – shared persistence layer for the entire Distributed Application, that can be queried synchronously. Such an EQS, whilst facilitating synchronous internal communication, does not break any fundamental tenet of Event-Driven Architecture. For that reason, I shall nominate the ‘Entity Query Store’ persistence layer as the fourth principle of EDA.
In concluding his Blog, Martin wrote: “I’d love to write some definitive treatise that sorts all this confusion out, and gives solid guidelines on how to do each pattern well, and when it should be used. Sadly I don’t have the time to do it”. Given this, someone else probably ought to get the ball rolling!
‘Event-Driven Architecture’ principles for Distributed Enterprise Applications
1) ‘Always Asynchronous’ communication between software components
2) Each ‘Entity Type’ should be owned by a single ‘Owner-Component’
3) Provision of single-software-component, Entity Type-level ‘Macroservices’ for the exposure of operations to other software components of the Distributed Application
4) Use of an ‘Entity Query Store’ as a read-only, shared persistence layer
(To which we can add)
5) Don’t use ‘Event Notification’ as State changes made to an Entity will not be immediately available in the payload, which will consequently require direct, synchronous communication between concerned software components (at which time the source might be down). It likewise does not permit efficient replication on an ‘Entity Query Store’, or any other related components
6) Don’t let ‘Publisher’ software components perform API calls/send Commands to other software components. Neither of those things are ‘Publishing’, and neither are Event-Driven
More technical detail on each of these Principles can be found in the companion: ‘Event-Oriented Architecture’ Manifesto: https://camhunt.medium.com/event-oriented-architecture-manifesto-for-distributed-enterprise-applications-327f6e88b12f