Journal of Big Data (Jan 2024)
Instance segmentation on distributed deep learning big data cluster
Abstract
Abstract Distributed deep learning is a promising approach for training and deploying large and complex deep learning models. This paper presents a comprehensive workflow for deploying and optimizing the YOLACT instance segmentation model as on big data clusters. OpenVINO, a toolkit known for its high-speed data processing and ability to optimize deep learning models for deployment on a variety of devices, was used to optimize the YOLACT model. The model is then run on a big data cluster using BigDL, a distributed deep learning library for Apache Spark. BigDL provides a high-level programming interface for defining and training deep neural networks, making it suitable for large-scale deep learning applications. In distributed deep learning, input data is divided and distributed across multiple machines for parallel processing. This approach offers several advantages, including the ability to handle very large data that can be stored in a distributed manner, scalability to decrease processing time by increasing the number of workers, and fault tolerance. The proposed workflow was evaluated on virtual machines and Azure Databricks, a cloud-based platform for big data analytics. The results indicated that the workflow can scale to large datasets and deliver high performance on Azure Databricks. This study explores the benefits and challenges of using distributed deep learning on big data clusters for instance segmentation. Popular distributed deep learning frameworks are discussed, and BigDL is chosen. Overall, this study highlights the practicality of distributed deep learning for deploying and scaling sophisticated deep learning models on big data clusters.
Keywords