Technical Articles
A harmonized API-Layer? We need a harmonized Entity-Layer!
As someone that regularly works in a space referred to by SAP as ‘Integration Architect’, I recently decided to work through the openSAP course Simplify Integration with SAP Cloud Platform Integration Suite, of June 2019. Early in this material we see a demonstration of a pre-packaged CPI ’Integration Flow’: SuccessFactors to SAP HCM ‘Employee Data Replication’. A second demonstration was later provided showing the replication of Exchange Rate data from S/4 to SuccessFactors, using another pre-packaged CPI Integration Flow (or ‘iFlow’). It then became apparent that SAP has decided to solve the – reborn – problem of the ‘Golden Record’ using Data Replication; there is no ‘Golden Record’, but instead ‘all-records-are-equal’.
A question that remained, after seeing how easy it was to set up these iFlows using CPI, was what happens if I want to also send employee data changed in SuccessFactors to a second or third system (upon the basis of a brute-force, all-records-are-equal approach), and what if one of those (possibly non-SAP) systems is not covered by an existing CPI iFlow? The result is clear, you need to build a parallel solution, using an entirely different approach, and both solutions must be maintained in parallel. At this point in time one must ask, given this complexity and future ‘Lock-In’ risks, if it wouldn’t be more simple to build only a single custom integration that manages all of the companies integration needs; one which does not carry any licensing costs? I suspect the answer should be straight-forward for most.
And what of the second example, Exchange Rate Data Replication? Could we imagine this data needing to be diffused to more than only SuccessFactors? Could we possibly even imagine a need for real-time Exchange Rate data? This latter would be problematic using the CPI iFlow approach, as this CPI Integration is scheduled; not event-driven. In any case, let us imagine a scenario where an enterprise has five diverse back-end systems that all depend upon near-real-time Exchange Rate data, where all five are miraculously covered by pre-packaged CPI Integration Content. Each one of the five separate iFlows would need to be scheduled to run every 2-5 seconds = up to 150 job runs per minute on CPI for a single data replication requirement (and regardless of whether the Rate has changed or not). This turns the very simple need for an Exchange Rate ‘Golden Record’ into a costly and complex proposition; should we pretend that all five back-ends were by some lucky circumstance all covered by pre-packaged CPI Integration Content, and that this doesn’t change (i.e. there is never a need to add a new, non-included system).
The Course goes on to discuss CPI ‘API Management’ for the ‘monetization of your developed APIs’. Here, we are told to “Think about launching an API like launching a product. It’s about doing research, finding the right API to build, identifying who will use the API, what will they be willing to pay”. At this point in time, it occurred to me that I have never seen an ERP client in twenty years that develops completely new SOAP APIs for its external partners. Indeed, it seems that such APIs are normally the domain of (software/) ERP vendors; not ERP clients. If an enterprise makes the decision to install a large and complex ERP landscape, it could be argued that it has also made a decision NOT to develop their own custom solutions; or at least to develop as few as possible. And for those custom integrations that they will inevitably need to put in place with their partners, don’t they usually exploit the large number of well-established, publicly-available standards for mass data exchange, such as EDIFACT, iDoc, cXML, to name but a few? Isn’t that precisely why such standards for data exchange are public, and stable?
We then learn about SAP’s ‘Digital Integration Hub’, a recent product offering from SAP “for implementing large-scale, high-throughput APIs by inserting a high-performance, In-Memory ‘Data Store Layer’, between the ‘API Service Layer’ and the System-of-Record”. It is the ‘Integration Layer’ component of the Digital Integration Hub that “Keeps the system of record sources and high-performance data store in sync”. As such, it seems that the fake-gold records that have already been duplicated across various back-end systems, via scheduled jobs running on CIP, must now also be duplicated into the Digital Integration Hub’s In-Memory ‘Data Store Layer’. Additionally, if you haven’t already developed a custom API for each of your custom integration needs, then you have nothing to expose to your partners in the Digital Integration Hub’s customer-facing ‘API Service Layer’. As a Cloud-based solution, you have probably also guessed that irrespective of any large investments your company has already made to beef-up your existing On-Prem S/4 DB Server for the robust needs of In-Memory computing, none of that memory is useful in a Cloud-based scenario: you must instead buy a subscription for the Cloud-resident ‘In-Memory Data Grid’ of ‘HANA-as-a-Service’. You must then use HANA’s ‘Smart Data Integration’ (‘Data Provisioning Agent’) for ‘live’ Data Replication into the new Cloud-based ‘In-Memory Data Grid’ (multiplied by the number of exposed APIs). This solution nonetheless “Decouples front-end API services from the system of record for fast response time”; as you want your partners to be willing to pay you for API-use. It also “Enables front-end API services to access data scattered across multiple back-end systems”; thereby making the aggregated and exposed fake-gold records ephemeral.
More recently launched by SAP, and therefore not covered in the mentioned Course, is the Cloud-based ‘SAP Graph’, a Beta-product launched as recently as September 2019, that is “still under development”. SAP Graph “wraps the APIs of existing [SAP] products into a single harmonized API layer across the existing [SAP] source systems”. It is a “network of connected data objects that are stored and owned across different SAP solutions and technologies. The data objects that are made accessible through SAP Graph in an interconnected data model can be consumed through an HTTP-based API, which exposes data from multiple SAP systems in a unified schema” = in a ‘harmonized Entity-Layer’ (albeit one with no concrete existence). However, SAP Graph will offer only a ‘curated’ set of APIs; re-exposing approximately 20% of existing back-end APIs with normalized signatures (because developers should hopefully find this approach less complex). And what if your solution requires one of the 80% of APIs that are not re-exposed by SAP Graph, or APIs that are hosted by non-SAP products? Once again, you can put in place a parallel solution, using a completely different technology, and hope that you never need to add non-SAP (or old-SAP) products to the integrations that you spent weeks developing with SAP Graph; at which point in time you will need to start again (having already paid the relevant subscription).
What is quite interesting about this, is that I only just heard in the same openSAP CPI course that thanks to the ‘High-Performance Data Store’ of SAP’s new Digital Integration Hub, we can now build “a single, consolidated view of entities, the data for which is stored in one or multiple [SAP or non-SAP] System-of-Records” (in an In-Memory cache). But what is most worrying of all about SAP’s ‘API-first’ dogma, is the willingness to confuse APIs – the core building block of SOA macroservices – with data, which exists completely independently of any particular architectural pattern. SAP Graph, “a single harmonized API layer”, exposes a “network of connected data objects” in a harmonized data model; which “can be consumed through an HTTP-based API, which exposes data from multiple SAP systems” (but not from non-SAP systems).
Likewise, SAP Graph and SAP’s Digital Integration Hub – both being fully API-centric – have no role to play whatsoever in an ‘Event Driven Architecture’ (something SAP also mentions from time-to-time when discussing its new ‘Enterprise Messaging’ product). Why do I mention ‘Events’? Because people typically talk about either data, or transactions; about master data, or transactional data. In fact, each of these things are ‘Events’, and that is why ‘Event Driven Architectures’ will quickly replace APIs for internal macroservices; APIs only ever being needed for external partners, and often not even then. To provide an illustration: if a customer requests a Sales Order, that Request represents an Event, and the eventual Order Creation also represents an Event; as does any subsequent Change to the Order (an Event typically referred to by SAP users as ‘VA02’). Events represent instances of ‘transactions’, and those events are stored in the Database as ‘data’ records; for which reason, you will have a hard time finding any single record in the DB that is not the direct consequence of an ‘Event’ (eg. ‘EmployeeHired’). This is precisely why an ‘Event Driven Architecture’ so naturally lends itself to the creation of a harmonized ‘Entity-Layer’; something far more natural, and far more useful, than a harmonized ‘API-Layer’. There is, conversely, no conceptual link between data and APIs.
In order to help SAP towards a shift in mindset from the (SOAP-based) ‘Service-Oriented Architecture’ that reigned-supreme between approximately 2007 and 2017, towards the ‘Event-Driven Architecture’ that is very quickly growing in popularity today – and which happens to be a perfect pattern for the macroservices needed in ERP-centric landscapes – I will need to make a possibly uncomfortable point: The Application needed to build an In-Memory, Event-Driven, fault-tolerant, high-availability, harmonized and centralized common ‘Entity-Layer’ – addressable by HTTP calls – is already freely available within the Open Source Community, and it fully supports On-Premise landscapes. The ‘High-Performance Data Store’ of SAP’s Digital Integration Hub – with which we can build “a single, consolidated view of entities, the data for which is stored in one or multiple System-of-Records” – is no more than the ‘State Store’ of Apache ‘Kafka Streams’; something that can be run In-Memory, out-of-the-box. Perhaps even more interesting in this regard, is that SAP already provides free integration for Kafka (https://github.com/SAP/kafka-connect-sap).
Incoming Events can be ‘folded’ into their corresponding ‘State Stores’ using the ‘Event Sourcing’ Pattern, meaning that events are merged as they arrive, in real-time – not using schedules – and that the aggregated entity record that can subsequently be queried by SAP or non-SAP clients, represents the ‘Golden Record’ of each ‘Entity’. Given that SAP already uses Open Source solutions in its commercial products (e.g. PostgreSQL in ‘API Management’), there is no time like the present; SAP’s clients need a harmonized and centralized common ‘Entity-Layer’. The 1970’s response to the Golden Record problem was the ‘ERP’ – where there is no (single) ‘Golden Record’, there is no ‘ERP’. Some fifty years later, it seems clear that ‘Event Sourcing’ is the ideal modern response to this re-emerged problem; one that ERPs can perhaps no longer solve, most evidently in the new context of IoT and Edge Computing.
What’s more, the various Kafka ‘Topics’ (eg. ‘SalesOrders’) fed into the ‘State Stores’ provide a full – auditable – history of all Business Entity Events – coming from any number of SAP or non-SAP back-ends (Cloud or On-Prem) – that can be easily interrogated by each client using temporal queries in order to return only new, unprocessed events. In this case, the last read ‘Offset’ of each Topic – always managed by each client – would represent the equivalent of an OData ‘e-Tag’. That’s important, because it solves a fundamental problem of the ever-growing number of Offline-Mobile scenarios (in a fashion very similar to that which I described in my Blog: How to build a ‘Rolling-Delta Database’ for Offline Mobile scenarios). Conversely, how do SAP Graph and SAP’s Digital Integration Hub help to meet the growing need for Delta-queries – multiplied by the number of mobile devices – on the ever changing entities referenced in Offline-Mobile scenarios?
While your text is long, it is worthwhile reading.
I share the same experience, btw. Meanwhile it is the customers who have better ideas on how to integrate the systems and the event driven architecture is perfectly suited for most.
I would however place APIs and event driven architectures into two different buckets.
Synchronous communication is not suited for event architectures; API centric architectures can send and receive events but it is unnecessarily complex and error prone.
So the question is, how often do customers want to inform another system about changes vs how often should there be a booking-like synchronous call? The answer is obvious: 99% of the cases it is informing a downstream system.
And then there are the other advantages you get from Kafka-like DLog providers: Load balancing, parallel processing, schema evolution, schema registry, re-read, high throughput, flexibility in docking other apps to it at will, stream processing, change data capture.
I tried to promote that inside SAP for a long time.
Thank you Werner Dähn for your positive comments. Very motivating to me, given your past experience.
I find it very telling that the 1% synchronous-API example you give is where “A user wants to create a sales order and actively waits for the API to complete”; a need that would almost certainly be managed today by a Fiori App (exposed also to external users = without the slightest need for SAP ‘API Management’) calling the relevant OData Operation. I make this point because the best example you could think of — albeit in a short space of time — was one that would probably never use a SOAP-based API (and if it did, it would use an existing API already developed by SAP (between 2006-2012): there would be no chance whatsoever of the company developing a new SOAP-based API from-scratch today => hardly ‘API-first’). I understand that you probably consider OData REST calls as being another form of API, which they obviously are, but for the purpose of my Blog, I was referring to APIs mainly as being those synchronous, SOAP-based services that have underwritten the SOA Architecture for the past 15 years (something for which SAP ‘API Management’ could conceivably play some role).
In any case, for me, the main difference between Synchronous and Asynchronous imperatives is in fact a question of granularity: Microservice or Macroservice. Microservices will often be Synchronous; Macroservices should always be Asynchronous => as long as they pass the ACID test.
Thanks again, Cameron