Preparing Data in Process Mining – Challenges with Event Log Generation
Process mining is taking the world of process management by storm. For many organizations, it is the starting point of successful business transformation and an enabler of operational excellence. In essence, process mining technology is based on event logs. Event logs are specific type of system data, typically created by Enterprise Resource Planning (ERP) systems and composed of a sequence of events, each of which includes at least a case identifier and an activity reference. However, event log data is typically not readily available in ERP systems. Instead, event logs need to be extracted from different database tables of one or several ERP systems, often using customized extraction scripts whose implementation requires both technical and domain expertise. Indeed Event log generation is not a straightforward task. There are multiple complicating factors, ranging from data quality challenges to the identification of clear business goals.
Together with Professor Mathias Weske from Hasso-Plattner-Institute, we have outlined five areas where event log generation could post challenges to enterprise information system. In each domain, we surveyed product domain experts and software engineers to understand the challenges and ideate innovative solutions.
“The findings highlight that the interplay between modeling, execution, and analysis systems are more nuanced and sophisticated than assumed. Deep expertise in all three areas is essential for combining the best of human creativity and machine intelligence in next-generation business process intelligence products“. – Timotheus Kampik, Principal Scientist at SAP Signavio.
- Process Execution & Modelling
Process mining is concerned with the analysis of process execution data, i.e., logs of how the real-world business processes run. Process execution is in turn influenced by process models in two ways: directly when a model is deployed to an execution engine, or indirectly when a model informs how a human configures the enterprise systems. However, 81% of respondents confirmed that business processes are not executed exactly as specified by a process model in a ccording to 66% of respondents, process models impact process execution, at least indirectly. Thus, the survey responses confirm that process models are primarily a means for communication and analysis for humans and are rarely used directly as executable artifacts. This means there is no tight coupling of a process model to an event log through a system that executes the exact model. Consequently, models often need to be created and adjusted for the specific purpose of process mining-based event log analysis.
- Process Scoping
Process scoping is the critical preparation work prior to process mining. In this step, technical process experts identify the right scope of the processes to start the analysis. Without proper scoping of a process, organizations find it extremely challenging to identify the right starting point for event log generation and process analysis. This has been validated in our survey, where 69% of the experts report that the identification of the start and the end of a process instance is challenging. Besides, 63% of the respondents highlighted that assigning the correct events to a given case is a challenge, i.e., process experts find it hard to determine which events a process instance should entail exactly. Additionally, 78% of the experts reported challenges due to data quality issues, such as missing data.
- Event Log Extraction
At this stage, technical process experts extract the process data from the enterprise systems and transform the data into an event log. As a saying goes, “event log extraction takes up 80% of the time in the entire process mining project.” This has been somewhat confirmed in our survey, where 81% of the respondents stated the event log extraction efforts . Often multiple data sources are involved in event log extractions and, according to 84% of the respondents, event logs are typically extracted from multiple database tables. Sources for event logs are often relational databases of enterprise systems or CSVs, whereas data lakes and event-based systems seem to be emerging as alternatives. These responses highlight that enterprise systems should ideally have highly integrated event log generation and extraction capabilities. That is native event log generation, i.e., the creation of event logs at process run-time.
- Process Discovery
In the process discovery phase, process intelligence software generates process models based on event logs. This results in graphical diagrams and metrics such as process performance indicators (PPIs), allowing analysts to scrutinize the processes and derive insights from them. However, 97% of the process experts believe that this process is particularly challenging due to the complexity of the generated process model. While analyzing PPIs is important, 91% of the respondents highlight that understanding the order in which the activities are executed is also valuable to facilitate insightful analysis in and by itself. In addition, 69% of respondents emphasize that having a clear understanding of the process variants can be helpful. This finding confirms the value of the key features of contemporary process mining software.
- Process Analysis
At the process analysis stage, process analysts examine the generated data aggregations and models to develop insights. 88% of the process experts believe an integration of process mining with traditional Business Intelligence (BI) is important for generating insights With this integration, analysis can benefit from the best of the worlds of static, cube-based data BI and control flow-oriented process mining, which often complement each other, for example when tracking KPIs from these different perspectives. 84% of the respondents highlight the importance of Key Performance Indicators (KPIs) at the analysis stage, while a considerable 69% recognize flow properties as similarly important. The ability to provide advanced insights by integrating Process and Business Intelligence is one of the current frontiers in process mining.
The mining of process control flow from event log data is a cornerstone to business process intelligence. From a practical perspective, the next process mining research frontier lies in addressing data integration and data quality challenges. According to David Eickhoff, VP Engineering Research at SAP Signavio, “this research study further supports the long-term vision of creating a tighter integration between process intelligence software and execution systems. Our current efforts around native event log functionality that enables us to address process mining data needs already at process execution run time. We also see great innovation potential in the application of process models as tools for process scoping”.
It is essential to understand and come up with new innovative products to help organizations bridge the gaps in process mining, eventually contributing to successfully business transformation projects for organizations. SAP Signavio has been investing in building the bridge to the most sophisticated software system that provides data input for event log generation. With the unique position to address the challenges, SAP aims to continue innovating to help organizations to unleash the potential of business process intelligence.
Read more on the full research paper on “Event Log Generation: An Industry Perspective”, to be presented at BPMDS 2022.