This paper introduces FlowSphere, a universal engine designed for the efficient execution of data-flow programs in distributed environments. Building upon the foundation laid by existing execution engines, FlowSphere abstracts much of the complexity traditionally associated with distributed programming, offering developers a simplified yet powerful framework. Unlike prior systems, FlowSphere uniquely supports data-dependent control flow, enabling the natural expression and execution of iterative and recursive algorithms within data-flow applications. This capability significantly broadens the class of programs that can be efficiently executed in distributed settings. FlowSphere exhibits scalable and robust performance across various tasks, including iterative and non-iterative workloads, deployed on a modern cloud computing infrastructure. Its architecture is specifically optimized to manage dynamic control flows, often challenging for traditional data-flow systems. As a result, FlowSphere can handle complex data processing workflows that involve repetitive or recursive computations without sacrificing efficiency or scalability. Through comprehensive evaluations, FlowSphere demonstrates its potential to serve advanced computational needs, from scientific simulations to large-scale data analytics. Its flexibility and performance make it an ideal solution for researchers, developers, and organizations looking to leverage the power of distributed computing without being hindered by the intricacies of underlying system management.
FlowSphere: A General-Purpose Runtime for Distributed Data-Flow Computing
Zanardo, Enrico
2024-01-01
Abstract
This paper introduces FlowSphere, a universal engine designed for the efficient execution of data-flow programs in distributed environments. Building upon the foundation laid by existing execution engines, FlowSphere abstracts much of the complexity traditionally associated with distributed programming, offering developers a simplified yet powerful framework. Unlike prior systems, FlowSphere uniquely supports data-dependent control flow, enabling the natural expression and execution of iterative and recursive algorithms within data-flow applications. This capability significantly broadens the class of programs that can be efficiently executed in distributed settings. FlowSphere exhibits scalable and robust performance across various tasks, including iterative and non-iterative workloads, deployed on a modern cloud computing infrastructure. Its architecture is specifically optimized to manage dynamic control flows, often challenging for traditional data-flow systems. As a result, FlowSphere can handle complex data processing workflows that involve repetitive or recursive computations without sacrificing efficiency or scalability. Through comprehensive evaluations, FlowSphere demonstrates its potential to serve advanced computational needs, from scientific simulations to large-scale data analytics. Its flexibility and performance make it an ideal solution for researchers, developers, and organizations looking to leverage the power of distributed computing without being hindered by the intricacies of underlying system management.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.