Process Mining: Introduction and Process Discovery
- Jan 30
- 3 min read
In recent years, the field of process mining has emerged, with techniques aimed at translating data captured during process execution into actionable insights.
What is process mining? What data should be used for process mining? What types of analysis are possible with process mining tools? The article below will answer all these questions.
Definition of Process Mining
Modern information systems allow us to track, often in great detail, the execution of processes (sequences of events) within companies. Take, for example, baggage handling at airports, product and goods manufacturing processes, or service delivery processes—all of these processes generate valuable event data traces.
This data is generally stored in a company's information system and describes the execution of the process in question.
Process mining techniques aim to translate the data captured during process execution into actionable information. As such, three main types of process mining analysis are identified: process discovery, compliance monitoring, and process improvement.
In process discovery, the goal is to identify and establish a process model, i.e., a formal behavioral description that describes the process as captured by the event data.
In compliance monitoring, we seek to assess the extent to which event data matches a given reference model.
Finally, in process improvement, the main objective is to improve the vision of the process, i.e., by improving process models based on facts derived from event data.
What data should be used for process mining?
To run Process Mining algorithms, you need to have event log data detailing the various tasks performed as part of the process being studied, preferably dated. The main data format used is XES (eXtensible Event Stream), which is derived from XML.
It is possible to use CSV files if they contain the following variables:
Trace: the identifier of the sequence/collection of events (e.g., 1 for the creation of the first product)
Activity: the name or identifier of the event/task performed (e.g., assembly of two parts of a product)
Activity status: “started,” “completed,” etc.
Timestamp: the date and time when the activity was performed
Resource: the resource that performed the activity (can be a person, a robot, etc.)
A wide variety of algorithms and visualizations
Open-source process mining tools offer a wide range of processing options, from simple descriptions and statistics on the various variables in the log to visualizations of different processes using complex algorithms.

One of the most commonly used analyses in process mining is the Petri net, which allows a process to be modeled and the various events (activities) and transitions between them to be determined. To summarize briefly, events are indicated by bubbles, and transitions by arrows between these bubbles.
For this classic treatment, there are several algorithms that can be used to obtain different visualizations of this network. The results of the associated algorithms can be modified according to their settings, which can be somewhat complex for novices, even though the default settings already produce satisfactory results. For example, it is possible to ignore a certain percentage of the data to obtain a potentially simpler and more easily readable model, or to distinguish between two identical activities with different states.
A distinctive feature of process discovery in Process Mining is the creation of automated animations, an innovation compared to the more traditional visualizations generally found in Data Science, such as static graphs (histograms, curves, etc.). The animation in the example below shows an image of a Petri net animation, in which each token (in yellow) corresponds to the path of a process in chronological order, taking into account the time and date when each activity was performed.

Process Mining, a little-known field
Although Process Mining can be considered a subset of Data Science, it is nonetheless very specific. Indeed, due to the data it uses, namely event logs containing several variables necessary for its application, as well as its visualizations, it can be considered a niche in the field of data analysis.
Process discovery in Process Mining allows business process models to be automatically created from the data generated by resources when performing various tasks. These analyses provide significant added value, enabling a clear understanding of business processes.



Comments