Semantics Is Not All You Need – Towards Scaling Process Data Analysis
Process intelligence projects frequently face a cold start problem: once the data is onboarded1, substantial domain knowledge and data analysis expertise are required to write and chain the analysis queries, ideally leading to actionable insights. At SAP Signavio, we want to address this challenge by:
- Moving as much of the query specification efforts from our customers to us, scaling the results of the work across as many organizations as possible;
- Automating as much of the query generation and chaining as possible.
In this blog post, we will discuss the challenge of scaling process data analysis and explore the limitations of relying solely on semantics in this context. We will also examine the role of data-driven approaches in reducing the human effort needed for analysis query creation and maintenance. Finally, we will highlight a recent collaboration between SAP Signavio and the Technical University of Munich to further integrate practical and academic perspectives in the design and implementation of better scalable process analysis recommendation systems.
At SAP Signavio, we actively work towards the productization of both aforementioned approaches, most notably in the context of the SAP Signavio Process Explorer, which bundles the knowledge generated by decades of SAP consultancy work. Still, to ensure we will continue to advance the process intelligence frontier for years to come, it is essential to look beyond current roadmaps and partner with our intellectual friends in the academic community that has served SAP Signavio – which started as an open-source project at the Hasso Plattner Institute of the University of Potsdam — so well over the years. In particular, it is important to systematically evaluate the integration of knowledge-based and data-driven approaches to decrease the human effort needed for analysis query creation and maintenance. Here, two factors play a key role:
- Human generation and maintenance effort, relative to knowledge base size and impact. Reusable queries and analysis templates that ease process intelligence efforts for customers are knowledge, and the human maintenance of knowledge is costly. Utilizing data-driven approaches such as the recent advances in generative AI capabilities and large language models (enabled by the transformer neural network architecture) can help reduce the human effort needed to generate and maintain sound and complete chains of queries that enable actionable and impactful process data analysis.
- Machine ability to draw reliable conclusions based on imperfect knowledge (and data).
Non-trivial knowledge does not scale perfectly: the larger the knowledge base (e.g., the collection of queries and templates) the more likely parts are inconsistent or stale. In addition, when scaling analysis good practice tools across organizations, some adaptation to the local context is typically required. Data-driven approaches can help deal with these inevitable flaws and limitations, by automatically handling failures and adjusting queries in order to enable successful and useful inference, and by systematically considering user feedback to continuously re-assess which analysis means should be utilized in a given context.
We may summarize the resulting challenge at the intersection of knowledge (semantics) and data as follows:
Semantics is not all we need2 – but how much semantics do we need?
To work towards answering this long-term question in the context of process intelligence capabilities and, more specifically, for process analysis recommendation systems, SAP Signavio has partnered with Prof. Stefanie Rinderle-Ma’s Chair of Information Systems and Business Process Management (BPM) at the Technical University of Munich, utilizing the Chair’s strong expertise at the intersection of BPM and applied AI. The recently started collaboration has already yielded initial results. In particular, two research papers of Stefanie’s group have recently been accepted for publication in the proceedings of a top-flight Information Systems conference3; they present approaches for obtaining temporal constraints for event log-based compliance checking from natural language text and for automatically detecting deviations between natural language-based constraints specifications, respectively.
We believe that partnering with academic communities helps us look beyond our current roadmaps and integrate practical and academic perspectives. By doing so, we aim to design and implement more useful and better scalable process analysis recommendation systems. If you are an academic researcher interested in discussing your work with us, you can contact us at firstname.lastname@example.org.
1: Data onboarding (in particular: event log generation) is another substantial challenge, see our blog post on that challenge.
2: Here, we allude to the seminal paper Attention Is All You Need by Vaswani and others that introduced the transformer architecture for neural networks, enabling recent advances in generative AI and perhaps most notably ChatGPT.
3: Catherine Sai, Karolin Winter, Elsa Fernanda & Stefanie Rinderle-Ma: “Detecting Deviations Between External and Internal Regulatory Requirements for Improved Process Compliance Assessment” and Maria del Sol Barrientos Moreno, Karolin Winter, Juergen Mangler & Stefanie Rinderle-Ma: “Verification of Quantitative Temporal Compliance Requirements in Process Descriptions over Event Logs”, to appear in the CAiSE 2023 proceedings.