This paper presents DIS-PIPE, a software tool that leverages well-established process mining techniques to tackle the Data Pipeline Discovery (DPD) task. Data pipelines are composite steps that move data from disparate sources to some data consumers. While data travels through the pipeline, it can undergo various transformations processed by computational platforms. In this context, DPD targets learning the structure and behavior of a data pipeline from an event log that keeps track of its past executions, uncovering, to some extent, specific execution-related dark data whose knowledge is critical to improving the quality of pipeline modeling. DIS-PIPE has been designed, implemented, and validated in the H2020 European project DataCloud context, and is able to interpret XES logs enriched with information to capture the core concepts of data pipelines.

DIS-PIPE: A Tool for Data Pipeline Discovery

Agostinelli S.;
2024-01-01

Abstract

This paper presents DIS-PIPE, a software tool that leverages well-established process mining techniques to tackle the Data Pipeline Discovery (DPD) task. Data pipelines are composite steps that move data from disparate sources to some data consumers. While data travels through the pipeline, it can undergo various transformations processed by computational platforms. In this context, DPD targets learning the structure and behavior of a data pipeline from an event log that keeps track of its past executions, uncovering, to some extent, specific execution-related dark data whose knowledge is critical to improving the quality of pipeline modeling. DIS-PIPE has been designed, implemented, and validated in the H2020 European project DataCloud context, and is able to interpret XES logs enriched with information to capture the core concepts of data pipelines.
2024
Dark Data
Data Pipeline
Data Pipeline Discovery (DPD)
DataCloud
Event Log
Process Mining
XES
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12606/22619
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
social impact