A frequent requirement is to validate data immediately after arrival. Not blocking the data entry, not failing the ingestion process but getting informed that there is some in-doubt data better to be double checked.
In S/4Hana, sales orders get created via various ways, manually via the SAP screen, or BAPIs are called by the online shop. Everything that is completely wrong is prevented by the ERP system itself already, e.g. ordering a material that is no longer available. But somebody ordering 10’000 pallets instead of one pallet with 10’000 pieces, such cases are technically correct. But logically doubtful.
Informing Business users about logical inconsistencies within seconds allows the user to check the sales order, lookup the customer’s order history and intervene before the wrong order is produced or shipped.
A modern architecture for such solution consists of realtime producers capturing any changes in the source systems. Changes in the SD Module, MM, FI, CO, … any data can be produced, including cloud and non-SAP systems. A rules service then applies checks, augments the data with the individual check results and provides the cleansed data to any interested party.
The Data Warehouse consumer can now load the data together with the rule results into SAP Hana for analysis. The volume of the rule results will be large, though. If there are 10m rows of data and 100 rules tested in average, the rule result table is 1bn rows large. But that is no problem for Hana. The rule result table is configured as warm storage, thus does not consume expensive memory.
Another realtime consumer creates emails based on the cleansed data to inform the correct person about the findings.
As this realtime enterprise bus is using Apache Kafka, the de-facto standard in the Big Data world, any producer, service or consumer can be used as well. Say SAP Data Intelligence consuming the rule results feeding an ML model to identify even more outliers?
The rules service and its microservices
The rules service does listen on new data on an Apache Kafka topic, passing new records through a set of microservices. This allows to use the result of a previous rule microservice in the next microservice, thus simplifying the individual rules.
For example in the step “20 – Standardization” various code values are standardized into the official code set. An order status might be “C” (Completed), “c” (completed), “r” (ready) depending on the source system but for our enterprise the official value is “C”, the one used in the ERP system. The next downstream microservice checking for logical inconsistencies must consider the official code set only. It tests if a completed order has a shipment date, not worrying about all the various codes for completed orders in the source systems.
Rule microservice configuration
The rules themselves are attached to the individual fields of the structure. To help with the entry, a sample record is shown and the rule result as well – see Validate button. Various Rule Types exist, ranging from a single test to an entire rule set with multiple test conditions.
The rule results are added to the record itself. This allows auditing and analysis of the data and the data quality without limitations. In the past only aggregated rule results have been available. But thanks to the Big Data approach of this solution, every record contains all rule results.
This allows to answer questions like
- What kind of rules failed often?
- Is the quality of the data decreasing?
- Is there a pattern of failed rules, e.g. it is the newly connected source system which violates one rule always!
A rule service is a vital component in every business. Thanks to its openness it can be used with any other Apache Kafka based component, SAP, non-SAP or Open Source.
Links to access a live demo system can be found here: https://rtdi.io/software/big-data-connectors/