Technical Articles
Events are the New Data
The Oxford Dictionary defines ‘Data’ as: “Facts … collected together”. In technical terms, ‘Data’ is more accurately: ‘Events folded together’ – ‘Folding’ being the merging of a particular Entity’s (State-changing) Events, in chronological order, to compute the latest Entity ‘State’.
Data – unlike Events – is an abstract notion. Whilst ‘Events’ represent actual changes to an Entity’s State (e.g. PurchaseOrderDeleted), ‘Data’ represents no more than the calculated State of a particular Entity, once all preceding Events have been folded together. At some point in history it was decided – owing to restrictive hardware costs at the time – that we could not possibly store all State-changing ‘Events’ in our Data-bases, but instead could store only ‘Data’: the ‘Folded State’ of each Entity (until such a time as even that was archived, owing to archaic hardware constraints).
The consequences of this historic Data-constraint are enormous. For one thing, it means that the Dimension of Time associated with all Events is thrown in the bin. For another, the intrinsic audit and logging potential of Events is also discarded. Finally, and perhaps more importantly given the exploding number of offline mobile use cases: the native ability of time-based Events to support OData Delta logic was completely lost (for which reason we scramble to reconstruct it post facto using complex mechanisms such as SAP’s Syclo Exchange Framework).
Whenever I hear such phrases as “Data is the New Oil”, it becomes clear that we remain trapped in our storage-constrained past, because what ought to be apparent, is that ‘Events are the New Data’. The very heavy investment of SAP in a new generation of ‘Column Store’ Data-base, is a mindless continuation of the same path first trodden in the 1960s; soon after our leap from tape to disk-based storage. Perhaps it is time for further reflection on this path?
Now that we have lost the shackles of our former storage constraints, we no longer have any logical, financial, or technical reason to be storing ‘Data’; we should instead be storing ‘Events’ which are, after all, the mother of all Data. Events should now all be persisted in today’s vast and cheap storage, and should only be folded into each Entity’s ‘Last State’ – In-Memory – after a certain delay (as proposed in my earlier Blog: How to build a ‘Rolling-Delta Database’ for Offline Mobile scenarios).
Using such a ‘Rolling-Delta’ mechanism would ensure that offline mobile scenarios could be fully managed In-Memory, whilst Events would nonetheless be folded as often as necessary in order to respect today’s physical Memory limits (cf. yesterday’s storage limits). Such an approach would add another dimension to our Analytical tools – and AI algorithms – by restoring the Dimension of Time to our persisted models. As such, what we should be discussing today is a next-generation, persistent ‘Event Store’ that completely replaces the Data-base, and an industry-standard In-Memory (Rolling-Delta) ‘Entity Store’, that correctly accounts for the fact that ‘Memory is the new Storage’.
‘Big Data’ can never represent any more than a subset of – Big – ‘Event Stores’, so let’s stop talking about Data (and Data Science, Data Architects, etc). Events are what matter, and today we have all we need to store and analyse them in real-time, whilst efficiently catering to the ever-growing demands of offline mobile applications; something we cannot properly support with 1960’s Data-based models (the very reason SAP dropped Client-Side Database ‘delta-tracking’ in SMP 3.0).
Interesting read. Have you looked at datomic from the clojure guys?
Anyway, in Hana you have History Tables which I guess is an implementation of temporal logic of SQL 2011. Plus there are a slew of other ways / systems to deal with events, like Kafka. And in relational DBs / systems people have created plenty of workarounds to deal with time.
Cheers, Wout
Hello Wout, and thanks for the comments.
Kafka is something I wrote about in my Blog A harmonized API-Layer? We need a harmonized Entity-Layer!, but Kafka was never designed as a persistence layer in my humble opinion. Event Stores are split across partitions, meaning you need to use an API every single time to bring those events together before you can even fold them = goodbye performance (workaround: use max. 1 partition per Entity Type, but...).
There are of course also workarounds for relational DBs as you state (I provided one in How to build a ‘Rolling-Delta Database’ for Offline Mobile scenarios), but why settle for workarounds? Why depend upon a solution that was designed to meet the technical constraints of the 1960's? There are more than enough new solutions available to us today (many OpenSource) to completely rethink our 'data' solutions, and to -- natively -- add a new Dimension to our OLAP tools, so why stay trapped in the 60's when the solution is so straight-forward?