This paper introduces FlowSphere, a universal engine designed for the efficient execution of data-flow programs in distributed environments. Building upon the foundation laid by existing execution engines, FlowSphere abstracts much of the complexity traditionally associated with distributed programming, offering developers a simplified yet powerful framework. Unlike prior systems, FlowSphere uniquely supports data-dependent control flow, enabling the natural expression and execution of iterative and recursive algorithms within data-flow applications. This capability significantly broadens the class of programs that can be efficiently executed in distributed settings. FlowSphere exhibits scalable and robust performance across various tasks, including iterative and non-iterative workloads, deployed on a modern cloud computing infrastructure. Its architecture is specifically optimized to manage dynamic control flows, often challenging for traditional data-flow systems. As a result, FlowSphere can handle complex data processing workflows that involve repetitive or recursive computations without sacrificing efficiency or scalability. Through comprehensive evaluations, FlowSphere demonstrates its potential to serve advanced computational needs, from scientific simulations to large-scale data analytics. Its flexibility and performance make it an ideal solution for researchers, developers, and organizations looking to leverage the power of distributed computing without being hindered by the intricacies of underlying system management.

FlowSphere: A General-Purpose Runtime for Distributed Data-Flow Computing

Zanardo, Enrico
2024-01-01

Abstract

This paper introduces FlowSphere, a universal engine designed for the efficient execution of data-flow programs in distributed environments. Building upon the foundation laid by existing execution engines, FlowSphere abstracts much of the complexity traditionally associated with distributed programming, offering developers a simplified yet powerful framework. Unlike prior systems, FlowSphere uniquely supports data-dependent control flow, enabling the natural expression and execution of iterative and recursive algorithms within data-flow applications. This capability significantly broadens the class of programs that can be efficiently executed in distributed settings. FlowSphere exhibits scalable and robust performance across various tasks, including iterative and non-iterative workloads, deployed on a modern cloud computing infrastructure. Its architecture is specifically optimized to manage dynamic control flows, often challenging for traditional data-flow systems. As a result, FlowSphere can handle complex data processing workflows that involve repetitive or recursive computations without sacrificing efficiency or scalability. Through comprehensive evaluations, FlowSphere demonstrates its potential to serve advanced computational needs, from scientific simulations to large-scale data analytics. Its flexibility and performance make it an ideal solution for researchers, developers, and organizations looking to leverage the power of distributed computing without being hindered by the intricacies of underlying system management.
2024
MapReduce and Dryad, Hadoop and Spark, Cloud-Based Runtime Systems, Network Communication, Fault-Tolerance, Cloud-Based Distributed Computing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12606/25590
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact