大数据 (May 2020)
State-of-art research of cluster resource management in dataflow computing model
Abstract
The development of cluster-based high-performance computing has undergone three stages of evolution.With the widespread use of dataflow programming models such as Spark and Flink in the field of big data computing,how to ensure the fair share with the cluster resources by various dataflow computing applications is extremely important.It is also a main means to reduce the cost of infrastructures.As the drawbacks of traditional cluster resource management have becoming increasingly apparent in dataflow computing model,many alternative cluster resource management,including HoD,centralized scheduling,two-level scheduling,distributed scheduling,and hybrid scheduling management,have been proposed in recent years.Their respective advantages and disadvantages were introduced,and a certain reference for the uses or researches in development of cluster resource management and scheduling in a dataflow computing environment was provided.