The rapid growth of data-driven models in advanced driver assistance systems (ADAS) has been significantly accelerated by the widespread availability of large-scale public datasets enabling more accurate training and benchmarking. Despite this progress, achieving higher autonomy still requires continuous progress in core functionalities, necessitating an increasing supply of high-quality data. This escalating demand, coupled with the growing number of smart vehicles, presents substantial challenges to the sustainability of data operations, particularly concerning computational and energy resource consumption. This paper directly addresses the critical needs for sustainable practices in autonomous vehicle data operations, with a particular focus on data creation and post-processing. To this end, we present an applied study demonstrating a structured integration of existing data handling techniques into a scalable and modular pipeline, that strategically combines existing, proven data management techniques—such as deduplication, compression, anonymization, and event-driven filtering. This pipeline is particularly designed to minimize storage needs and increase data usability, thereby contributing significantly to overarching sustainability objectives without compromising data quality. The practical benefit of the proposed method lies in its ability to reduce storage requirements by over 50%, lower computational overhead, and simplify AI model training workflows using more compact and privacy-compliant datasets. The proposed pipeline has been validated on two real-world autonomous driving datasets, confirming its effectiveness in improving sustainability and efficiency in data operations for autonomous vehicles.

Toward Sustainable Data Collection Processes for Autonomous Vehicles

Bellone M.;
2026-01-01

Abstract

The rapid growth of data-driven models in advanced driver assistance systems (ADAS) has been significantly accelerated by the widespread availability of large-scale public datasets enabling more accurate training and benchmarking. Despite this progress, achieving higher autonomy still requires continuous progress in core functionalities, necessitating an increasing supply of high-quality data. This escalating demand, coupled with the growing number of smart vehicles, presents substantial challenges to the sustainability of data operations, particularly concerning computational and energy resource consumption. This paper directly addresses the critical needs for sustainable practices in autonomous vehicle data operations, with a particular focus on data creation and post-processing. To this end, we present an applied study demonstrating a structured integration of existing data handling techniques into a scalable and modular pipeline, that strategically combines existing, proven data management techniques—such as deduplication, compression, anonymization, and event-driven filtering. This pipeline is particularly designed to minimize storage needs and increase data usability, thereby contributing significantly to overarching sustainability objectives without compromising data quality. The practical benefit of the proposed method lies in its ability to reduce storage requirements by over 50%, lower computational overhead, and simplify AI model training workflows using more compact and privacy-compliant datasets. The proposed pipeline has been validated on two real-world autonomous driving datasets, confirming its effectiveness in improving sustainability and efficiency in data operations for autonomous vehicles.
2026
Autonomous vehicles
data collection
data processing
green computing
sustainable development
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12606/40325
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
social impact