Functional data analysis (FDA) and ensemble learning can be powerful toolsfor analyzing complex environmental time series. Recent literature hashighlighted the key role of diversity in enhancing accuracy and reducingvariance in ensemble methods.This paper introduces Randomized Spline Trees(RST), a novel algorithm that bridges these two approaches by incorporatingrandomized functional representations into the Random Forest framework. RSTgenerates diverse functional representations of input data using randomizedB-spline parameters, creating an ensemble of decision trees trained on thesevaried representations. We provide a theoretical analysis of how thisfunctional diversity contributes to reducing generalization error and presentempirical evaluations on six environmental time series classification tasksfrom the UCR Time Series Archive. Results show that RST variants outperformstandard Random Forests and Gradient Boosting on most datasets, improvingclassification accuracy by up to 14\%. The success of RST demonstrates thepotential of adaptive functional representations in capturing complex temporalpatterns in environmental data. This work contributes to the growing field ofmachine learning techniques focused on functional data and opens new avenuesfor research in environmental time series analysis.
Randomized Spline Trees for Functional Data Classification: Theory and Application to Environmental Time Series
Fabrizio Maturo
;
2024-01-01
Abstract
Functional data analysis (FDA) and ensemble learning can be powerful toolsfor analyzing complex environmental time series. Recent literature hashighlighted the key role of diversity in enhancing accuracy and reducingvariance in ensemble methods.This paper introduces Randomized Spline Trees(RST), a novel algorithm that bridges these two approaches by incorporatingrandomized functional representations into the Random Forest framework. RSTgenerates diverse functional representations of input data using randomizedB-spline parameters, creating an ensemble of decision trees trained on thesevaried representations. We provide a theoretical analysis of how thisfunctional diversity contributes to reducing generalization error and presentempirical evaluations on six environmental time series classification tasksfrom the UCR Time Series Archive. Results show that RST variants outperformstandard Random Forests and Gradient Boosting on most datasets, improvingclassification accuracy by up to 14\%. The success of RST demonstrates thepotential of adaptive functional representations in capturing complex temporalpatterns in environmental data. This work contributes to the growing field ofmachine learning techniques focused on functional data and opens new avenuesfor research in environmental time series analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.