Big Data pipelines are essential for leveraging Dark Data, i.e., data collected but not used and turned into value. However, tapping their potential requires going beyond the current approaches and frameworks for managing their life-cycle. In this paper, we present the challenges associated to the achievement of the Pipeline Discovery task, which aims to learn the structure of a Big Data pipeline by extracting, processing and interpreting huge amounts of event data produced by several data sources. Then, we discuss how traditional Process Mining solutions can be potentially employed and customized to overcome such challenges, outlining a research agenda for future work in this area.
Big Data Pipeline Discovery through Process Mining: Challenges and Research Directions
Simone Agostinelli;
2021-01-01
Abstract
Big Data pipelines are essential for leveraging Dark Data, i.e., data collected but not used and turned into value. However, tapping their potential requires going beyond the current approaches and frameworks for managing their life-cycle. In this paper, we present the challenges associated to the achievement of the Pipeline Discovery task, which aims to learn the structure of a Big Data pipeline by extracting, processing and interpreting huge amounts of event data produced by several data sources. Then, we discuss how traditional Process Mining solutions can be potentially employed and customized to overcome such challenges, outlining a research agenda for future work in this area.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.