Tongxin xuebao (Aug 2019)
Flow-network based auto rescale strategy for Flink
Abstract
In order to solve the problem that the load of big data stream computing platform is increasing with fluctuation while the cluster was not able to rescale efficiently,the Flow-network based auto rescale strategy for Flink was proposed.Firstly,the flow-network model was set up and the capacity of each edge that was calculated by self-learning algorithm.Secondly,the bottleneck of the cluster was acquired by maximum-flow algorithm and the resource rescheduling plan was drawn up.Finally,the resource rescheduling plan was executed and the stateful data was migrated efficiently by the data migration algorithm based on the strategy of data partitioning by bulk and bucket.The experimental results show that the strategy can effectively provide performance promotion in the application with complex stateful data.It improved the throughput of the cluster and reduced the time overhead of the data migration on the premise of satisfying the latency constrain of the application,which means that the strategy promotes the scalability of the cluster efficiently.