Skip to Content
Business Trends
Author's profile photo Aaron Kurz

Transforming Business Process Management with Reinforcement Learning

Reinforcement learning (RL) is a sub-type of machine learning that allows systems to learn from their interactions with their environment and improve their performance over time. It is a powerful method that has, for example, been used to create game-playing agents for sophisticated games like Go and StarCraft, but its potential applications go far beyond that. At SAP Signavio, we’ve been exploring the use of RL for business process management (BPM), where it can be used to optimize processes, improve decision making, and drive business results.

In this blog post, we explore the use of reinforcement learning for business process management and how it can potentially transform the way companies approach process improvement.

Reinforcement learning in a nutshell

RL aims to optimize a software agent’s behavior based on a numeric reward in an interactive environment. The agent must repeatedly choose from a set of options. Afterward, the action is evaluated, and a reward is calculated. This reward should be maximized. Learning which choices to make in what situation is achieved, essentially, through trial and error. Note that the chosen action may affect not only the current reward, but also the following situation and subsequent rewards [1].

Business process improvement with reinforcement learning

SAP Signavio has been researching the use of RL for business process improvements (BPI). The traditional BPM lifecycle [2] (as seen in Figure 1) is typically sequential and does not consider failures as part of the improvement process. This means that failures are not evaluated systematically, but rather seen as an inconvenience caused by inadequate planning in the redesign phase. If a failure occurs, the entire lifecycle must be restarted. This is a major issue: research on BPI has revealed that 75% of BPI ideas did not lead to an improvement. Half of them had no effect, and a quarter even had negative results [3].


Figure 1: Traditional BPM lifecycle [2]

Satyal et al. propose using DevOps principles to facilitate more effective and successful continuous process improvements [4]. Inspired by AB testing, they suggest testing different process variants side-by-side in a production environment, and routing individual customers to the most suitable variant using RL. This methodology, called AB-BPM, considers failures as “first-class citizens” and strives to increase both the efficiency and quality of BPI initiatives, thus allowing for a more continuous approach to BPI. A graphical representation of this methodology is shown in Figure 2.


Figure 2: Improved BPM lifecycle with reinforcement learning

During the experiment, the RL agent routes incoming process instantiation requests to either process variant autonomously. However, this could be too risky in practice. Additionally, no open-source research prototype of the AB-BPM methodology has been available, limiting possible extending work and more research on the existing methodology and tools.

These issues, combined with the promise of the methodology, have led to a project within SAP Signavio working on an open-source research prototype. This prototype, called Human-in-the-Loop AB-BPM [5] (HITL-AB-BPM), extends the AB-BPM methodology with elements of human control. The main mechanism of human expert control is a batching of the experiment, with the possibility to accept or modify the proposed routing by the RL agent through the human expert. Figure 3 shows the lifecycle of such a BPI experiment in the HITL-AB-BPM tool.


Figure 3: HITL-AB-BPM lifecycle [5]

Furthermore, we conducted an internal study with business process experts to better understand the implications of and requirements for such an automated process redesign approach as AB-BPM [6]. Some of the main findings of this qualitative study are: i) more possibilities of human intervention, and interaction between the RL agent and the human expert, are a core requirement,; ii) transparency and features for the participation of process participants are needed to make AB-BPM culturally viable; iii) integrated process execution is necessary to facilitate the seamless deployment of parallel process variants and deliver the real-time data needed for dynamic RL and routing.

The future of reinforcement learning in business process management

From a broader perspective, we see the following potential applications of reinforcement learning to business process management.

Business Process Reinforcement Learning

Given the high competitiveness of many modern markets, companies are under pressure to constantly improve their business processes. Thus, further development of the AB-BPM methodology is an interesting long-term direction for research and innovation. One future possibility is moving away from the experiment mindset, instead using RL continuously for routing customers to process variants. This means that RL could always be enabled instead of testing one new version and declaring a winner at the end: Incoming customer requests would be dynamically routed towards the most suitable process version. This ensures that the routing follows the ever-changing market environment and customer needs. Furthermore, given the high-scalability of a software-driven routing, one could also extend the approach to encompass more than two variants always deployed to production. This more involved version surpasses the concept of time-limited AB testing and could be referred to as Business Process Reinforcement Learning (BPRL).

Process Analysis Query Recommender System

The field of BPM, and particularly process mining, has seen rapid growth and popularity in recent years. This has resulted in a wealth of knowledge about process improvement and analysis. Yet, one challenge remains: how to best leverage this knowledge. Process analysts may often attempt to solve problems that have already been solved, unaware of the current best practices. A process analysis query recommender system based on RL could help address this issue. After importing the relevant data and setting analysis goals, an RL agent could suggest possible query templates. To determine its rewards and continuously improve the recommendations, it could observe which are chosen by experts and even poll for satisfaction afterwards. In this way, the system could become more and more accurate over time.

Modelling and LLMs

Large language models (LLMs) have recently gained immense popularity, with chatbots reaching millions of users and being integrated into various search engines. LLMs can also be trained to handle business process models, such as BPMNs. For instance, they can create entire models based on textual input or modify and extend existing models. However, as previously mentioned, LLMs often produce factually inaccurate results. Here, RL can be used to refine models and improve their output quality through a process called RL from human feedback [7]. Although RL would not be the focus of the user-facing BPM innovation, it would still be essential for achieving the desired quality levels. Moreover, note that these automated process re-modelling capabilities could be combined with the BPRL approach, which would essentially enable self-optimizing business processes.


BPM is all about making smart decisions in a dynamic environment. Given that RL is essentially developed for this problem space, it seems only natural to use it to support BPM practitioners.  Challenges that stopped RL from being used in BPM and other real-world scenarios in the past include partial observability of systems, data delays, as well as lacking explainability and preparatory off-line learning [8]. However, many of these challenges have been discussed and addressed by prior work [4]–[6] and the presented project and ideas show that RL does have the potential to innovate how business processes are modeled, analyzed, and improved. SAP Signavio is determined to stay at the forefront of innovations like these, driving business process transformations and making the world run better.

Join the conversation

Are you ready to dive deeper into this topic? Join the conversation by leaving a comment below and sharing your thoughts!


[1]       R. S. Sutton and A. G. Barto, Reinforcement Learning, second edition: An Introduction. MIT Press, 2018.

[2]       M. Dumas, M. La Rosa, J. Mendling, and H. A. Reijers, Fundamentals of Business Process Management. Berlin, Heidelberg: Springer Berlin Heidelberg, 2018. doi: 10.1007/978-3-662-56509-4.

[3]       C. W. Holland and D. Cochran, Breakthrough Business Results With MVT: A Fast, Cost-Free, “Secret Weapon” for Boosting Sales, Cutting Expenses, and Improving Any Business Process, 1st edition. Hoboken, NJ: Wiley, 2005.

[4]       S. Satyal, I. Weber, H. Paik, C. Di Ciccio, and J. Mendling, “Business process improvement with the AB-BPM methodology,” Inf. Syst., vol. 84, pp. 283–298, Sep. 2019, doi: 10.1016/

[5]       A. F. Kurz, B. Santelmann, T. Großmann, T. Kampik, L. Pufahl, and I. Weber, “HITL-AB-BPM: Business Process Improvement with AB Testing and Human-in-the-Loop,” Proc. Demo Sess. 20th Int. Conf. Bus. Process Manag., 2022.

[6]       A. F. Kurz, T. Kampik, L. Pufahl, and I. Weber, “Reinforcement Learning-supported AB Testing of Business Process Improvements: An Industry Perspective.” arXiv, Mar. 19, 2023. doi: 10.48550/arXiv.2303.10756.

[7]       L. Ouyang et al., “Training language models to follow instructions with human feedback.” arXiv, Mar. 04, 2022. doi: 10.48550/arXiv.2203.02155.

[8]       G. Dulac-Arnold et al., “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,” Mach. Learn., vol. 110, no. 9, pp. 2419–2468, Sep. 2021, doi: 10.1007/s10994-021-05961-4.

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Christian Drumm
      Christian Drumm

      Hi Aaron,

      very nice blog post 👍. It’s great that you reported on the results of you bachelor thesis here. I‘ll definitely have a look at the demo paper of the prototype.


      Author's profile photo Aaron Kurz
      Aaron Kurz
      Blog Post Author

      Hi Christian Drumm,

      thank you for your positive feedback! In case you have any questions, feedback or impressions you would like to discuss regarding the papers or prototype, feel free to reach out anytime!


      Best regards



      Author's profile photo Daniel Lerch
      Daniel Lerch

      Hey Aaron,


      congrats to this blogpost and your bachelor thesis. It's an interessting topic and conclusions.

      It would be very interessting to test you system against data mining and the conclusions of a "traditional" process management workshop.

      I like your idea of HITL. For me it's the main issue about machine learning. You have to improve your source information and correct them if needed manualy.


      I would be happy to have a short conversation with you in future.


      Best regards



      Author's profile photo Aaron Kurz
      Aaron Kurz
      Blog Post Author

      Hello Daniel,

      thank you for your feedback! Yes, I think after some more development and refinement the approach should definitely be tested against the current "gold standard" BPI approaches. Only then can we draw firm conclusions about its usefulness.

      I am glad you found the HITL approach interesting! And I'd love to continue the conversation about this. Feel free to reach out anytime to have a short meeting; my email is aaron.kurz[at]

      Best regards,


      Author's profile photo Frank Kraft
      Frank Kraft

      Cool idea, Aaron.

      to continue the thought, this came to my mind:
      Which level of process improvement (or is it actually process specialization?) might be achieved with this approach, and which level of process improvements may not be surpassed in principle, for which a creative designer will be needed (keyword: Irreducible Complexity).

      Maybe it is more a philosophical question, and more experience is needed, but that question came to my mind.


      Author's profile photo Aaron Kurz
      Aaron Kurz
      Blog Post Author

      Hello Frank,

      the outcome from the expert interviews was that the process change magnitude is not really limiting, but instead, information system changes would be a limiting factor. So the process changes could also be larger, as long as the changes happen within the existing - for the reinforcement learning configured - information system environment. So the approach might be most suitable for "process changes of any size within a given IS environment", but above that, we need more human/manual decision-making and support, since then also the needed real-time data for reinforcement learning would not be given anymore.

      Regarding the question of whether a process designer might be needed (i.e., if it is more suitable for process specialization or process redesign) for larger changes: Potentially yes, but that does not mean that the AB-BPM approach is not applicable then. An experienced process redesign expert "working together" with a HITL-AB-BPM (/HITL-BPRL) tool would be ideal for speeding up process improvement efforts and improving certainty in the results.

      I hope this answer is satisfactory!

      Best regards


      Author's profile photo Daniel Lerch
      Daniel Lerch

      Hey Frank and Aaron,

      the last topic is exactly what i had in mind.

      You will need a redesigner or expert for process management, but their work wil be involving. For the exaption and the change management you need the expierence on personal level. Sure, changes in an IT-system don't need so much change management.

      @Frank I dont know if it's just a philosophical question, but i think we have to have this in mind. 


      Best regards



      Author's profile photo Frank Kraft
      Frank Kraft

      To carry this idea one step further:

      If you say the information system is the limiting factor in this context, what if you presuppose a comprehensive reference process model for a given information system?

      The idea I just got was to automatically derive the variant candidates from that - without a designer. Should not be too difficult?

      Author's profile photo Daniel Lerch
      Daniel Lerch

      Sorry, I think it was a missunderstanding.

      Especially for information systems it would fit perfect. If the system works by its own and has limited manually interactions, it would be awesome.

      If you step out of the context of a IT-system, the world will be different. If more and more interactions are needed, than you need more the expierence of an change manager with process background. You will step back from fastest lane to more comfortable steps of the user/worker.


      Just an awesome topic!

      I'm a little bit sorry for being more conservative at this topic, but i also see all the oportunities.

      Author's profile photo Aaron Kurz
      Aaron Kurz
      Blog Post Author

      That would of course be possible and is, I think, akin to the idea of workflow systems, where processes are executed based on a pre-determined execution model/syntax. Then, large process changes could be made by small changes to the specification of a process!

      But still, executing then some automatically modified process versions would maybe still be to risky for many stakeholders, hence the HITL approach. I think the difficulty here comes also from the lack of industry adoption of such technologies, not the theoretical technical capabilities.