Transforming Business Process Management with Reinforcement Learning
Reinforcement learning (RL) is a sub-type of machine learning that allows systems to learn from their interactions with their environment and improve their performance over time. It is a powerful method that has, for example, been used to create game-playing agents for sophisticated games like Go and StarCraft, but its potential applications go far beyond that. At SAP Signavio, we’ve been exploring the use of RL for business process management (BPM), where it can be used to optimize processes, improve decision making, and drive business results.
In this blog post, we explore the use of reinforcement learning for business process management and how it can potentially transform the way companies approach process improvement.
Reinforcement learning in a nutshell
RL aims to optimize a software agent’s behavior based on a numeric reward in an interactive environment. The agent must repeatedly choose from a set of options. Afterward, the action is evaluated, and a reward is calculated. This reward should be maximized. Learning which choices to make in what situation is achieved, essentially, through trial and error. Note that the chosen action may affect not only the current reward, but also the following situation and subsequent rewards .
Business process improvement with reinforcement learning
SAP Signavio has been researching the use of RL for business process improvements (BPI). The traditional BPM lifecycle  (as seen in Figure 1) is typically sequential and does not consider failures as part of the improvement process. This means that failures are not evaluated systematically, but rather seen as an inconvenience caused by inadequate planning in the redesign phase. If a failure occurs, the entire lifecycle must be restarted. This is a major issue: research on BPI has revealed that 75% of BPI ideas did not lead to an improvement. Half of them had no effect, and a quarter even had negative results .
Satyal et al. propose using DevOps principles to facilitate more effective and successful continuous process improvements . Inspired by AB testing, they suggest testing different process variants side-by-side in a production environment, and routing individual customers to the most suitable variant using RL. This methodology, called AB-BPM, considers failures as “first-class citizens” and strives to increase both the efficiency and quality of BPI initiatives, thus allowing for a more continuous approach to BPI. A graphical representation of this methodology is shown in Figure 2.
During the experiment, the RL agent routes incoming process instantiation requests to either process variant autonomously. However, this could be too risky in practice. Additionally, no open-source research prototype of the AB-BPM methodology has been available, limiting possible extending work and more research on the existing methodology and tools.
These issues, combined with the promise of the methodology, have led to a project within SAP Signavio working on an open-source research prototype. This prototype, called Human-in-the-Loop AB-BPM  (HITL-AB-BPM), extends the AB-BPM methodology with elements of human control. The main mechanism of human expert control is a batching of the experiment, with the possibility to accept or modify the proposed routing by the RL agent through the human expert. Figure 3 shows the lifecycle of such a BPI experiment in the HITL-AB-BPM tool.
Furthermore, we conducted an internal study with business process experts to better understand the implications of and requirements for such an automated process redesign approach as AB-BPM . Some of the main findings of this qualitative study are: i) more possibilities of human intervention, and interaction between the RL agent and the human expert, are a core requirement,; ii) transparency and features for the participation of process participants are needed to make AB-BPM culturally viable; iii) integrated process execution is necessary to facilitate the seamless deployment of parallel process variants and deliver the real-time data needed for dynamic RL and routing.
The future of reinforcement learning in business process management
From a broader perspective, we see the following potential applications of reinforcement learning to business process management.
Business Process Reinforcement Learning
Given the high competitiveness of many modern markets, companies are under pressure to constantly improve their business processes. Thus, further development of the AB-BPM methodology is an interesting long-term direction for research and innovation. One future possibility is moving away from the experiment mindset, instead using RL continuously for routing customers to process variants. This means that RL could always be enabled instead of testing one new version and declaring a winner at the end: Incoming customer requests would be dynamically routed towards the most suitable process version. This ensures that the routing follows the ever-changing market environment and customer needs. Furthermore, given the high-scalability of a software-driven routing, one could also extend the approach to encompass more than two variants always deployed to production. This more involved version surpasses the concept of time-limited AB testing and could be referred to as Business Process Reinforcement Learning (BPRL).
Process Analysis Query Recommender System
The field of BPM, and particularly process mining, has seen rapid growth and popularity in recent years. This has resulted in a wealth of knowledge about process improvement and analysis. Yet, one challenge remains: how to best leverage this knowledge. Process analysts may often attempt to solve problems that have already been solved, unaware of the current best practices. A process analysis query recommender system based on RL could help address this issue. After importing the relevant data and setting analysis goals, an RL agent could suggest possible query templates. To determine its rewards and continuously improve the recommendations, it could observe which are chosen by experts and even poll for satisfaction afterwards. In this way, the system could become more and more accurate over time.
Modelling and LLMs
Large language models (LLMs) have recently gained immense popularity, with chatbots reaching millions of users and being integrated into various search engines. LLMs can also be trained to handle business process models, such as BPMNs. For instance, they can create entire models based on textual input or modify and extend existing models. However, as previously mentioned, LLMs often produce factually inaccurate results. Here, RL can be used to refine models and improve their output quality through a process called RL from human feedback . Although RL would not be the focus of the user-facing BPM innovation, it would still be essential for achieving the desired quality levels. Moreover, note that these automated process re-modelling capabilities could be combined with the BPRL approach, which would essentially enable self-optimizing business processes.
BPM is all about making smart decisions in a dynamic environment. Given that RL is essentially developed for this problem space, it seems only natural to use it to support BPM practitioners. Challenges that stopped RL from being used in BPM and other real-world scenarios in the past include partial observability of systems, data delays, as well as lacking explainability and preparatory off-line learning . However, many of these challenges have been discussed and addressed by prior work – and the presented project and ideas show that RL does have the potential to innovate how business processes are modeled, analyzed, and improved. SAP Signavio is determined to stay at the forefront of innovations like these, driving business process transformations and making the world run better.
Join the conversation
Are you ready to dive deeper into this topic? Join the conversation by leaving a comment below and sharing your thoughts!
 R. S. Sutton and A. G. Barto, Reinforcement Learning, second edition: An Introduction. MIT Press, 2018.
 M. Dumas, M. La Rosa, J. Mendling, and H. A. Reijers, Fundamentals of Business Process Management. Berlin, Heidelberg: Springer Berlin Heidelberg, 2018. doi: 10.1007/978-3-662-56509-4.
 C. W. Holland and D. Cochran, Breakthrough Business Results With MVT: A Fast, Cost-Free, “Secret Weapon” for Boosting Sales, Cutting Expenses, and Improving Any Business Process, 1st edition. Hoboken, NJ: Wiley, 2005.
 S. Satyal, I. Weber, H. Paik, C. Di Ciccio, and J. Mendling, “Business process improvement with the AB-BPM methodology,” Inf. Syst., vol. 84, pp. 283–298, Sep. 2019, doi: 10.1016/j.is.2018.06.007.
 A. F. Kurz, B. Santelmann, T. Großmann, T. Kampik, L. Pufahl, and I. Weber, “HITL-AB-BPM: Business Process Improvement with AB Testing and Human-in-the-Loop,” Proc. Demo Sess. 20th Int. Conf. Bus. Process Manag., 2022.
 A. F. Kurz, T. Kampik, L. Pufahl, and I. Weber, “Reinforcement Learning-supported AB Testing of Business Process Improvements: An Industry Perspective.” arXiv, Mar. 19, 2023. doi: 10.48550/arXiv.2303.10756.
 L. Ouyang et al., “Training language models to follow instructions with human feedback.” arXiv, Mar. 04, 2022. doi: 10.48550/arXiv.2203.02155.
 G. Dulac-Arnold et al., “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,” Mach. Learn., vol. 110, no. 9, pp. 2419–2468, Sep. 2021, doi: 10.1007/s10994-021-05961-4.