Functional data analysis (FDA) and ensemble learning can be powerful toolsfor analyzing complex environmental time series. Recent literature hashighlighted the key role of diversity in enhancing accuracy and reducingvariance in ensemble methods.This paper introduces Randomized Spline Trees(RST), a novel algorithm that bridges these two approaches by incorporatingrandomized functional representations into the Random Forest framework. RSTgenerates diverse functional representations of input data using randomizedB-spline parameters, creating an ensemble of decision trees trained on thesevaried representations. We provide a theoretical analysis of how thisfunctional diversity contributes to reducing generalization error and presentempirical evaluations on six environmental time series classification tasksfrom the UCR Time Series Archive. Results show that RST variants outperformstandard Random Forests and Gradient Boosting on most datasets, improvingclassification accuracy by up to 14\%. The success of RST demonstrates thepotential of adaptive functional representations in capturing complex temporalpatterns in environmental data. This work contributes to the growing field ofmachine learning techniques focused on functional data and opens new avenuesfor research in environmental time series analysis.

Randomized Spline Trees for Functional Data Classification: Theory and Application to Environmental Time Series

Fabrizio Maturo
;
2024-01-01

Abstract

Functional data analysis (FDA) and ensemble learning can be powerful toolsfor analyzing complex environmental time series. Recent literature hashighlighted the key role of diversity in enhancing accuracy and reducingvariance in ensemble methods.This paper introduces Randomized Spline Trees(RST), a novel algorithm that bridges these two approaches by incorporatingrandomized functional representations into the Random Forest framework. RSTgenerates diverse functional representations of input data using randomizedB-spline parameters, creating an ensemble of decision trees trained on thesevaried representations. We provide a theoretical analysis of how thisfunctional diversity contributes to reducing generalization error and presentempirical evaluations on six environmental time series classification tasksfrom the UCR Time Series Archive. Results show that RST variants outperformstandard Random Forests and Gradient Boosting on most datasets, improvingclassification accuracy by up to 14\%. The success of RST demonstrates thepotential of adaptive functional representations in capturing complex temporalpatterns in environmental data. This work contributes to the growing field ofmachine learning techniques focused on functional data and opens new avenuesfor research in environmental time series analysis.
2024
Statistics - Machine Learning
Statistics - Machine Learning
Computer Science - Learning
Statistics - Methodology
62M10, 68T05, 65D07, 68T10
I.5.1
I.5.2
G.3
I.2.6
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12606/14489
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact