Instance segmentation on distributed deep learning big data cluster

Mohammed Elhmadany; Islam Elmadah; Hossam E. Abdelmunim

doi:10.1186/s40537-023-00871-9

Journal of Big Data (Jan 2024)

Instance segmentation on distributed deep learning big data cluster

Mohammed Elhmadany,
Islam Elmadah,
Hossam E. Abdelmunim

Affiliations

Mohammed Elhmadany: Computer and Systems Engineering, Faculty of Engineering, Ain Shams University
Islam Elmadah: Computer and Systems Engineering, Faculty of Engineering, Ain Shams University
Hossam E. Abdelmunim: Computer and Systems Engineering, Faculty of Engineering, Ain Shams University

DOI: https://doi.org/10.1186/s40537-023-00871-9
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 37

Abstract

Read online

Abstract Distributed deep learning is a promising approach for training and deploying large and complex deep learning models. This paper presents a comprehensive workflow for deploying and optimizing the YOLACT instance segmentation model as on big data clusters. OpenVINO, a toolkit known for its high-speed data processing and ability to optimize deep learning models for deployment on a variety of devices, was used to optimize the YOLACT model. The model is then run on a big data cluster using BigDL, a distributed deep learning library for Apache Spark. BigDL provides a high-level programming interface for defining and training deep neural networks, making it suitable for large-scale deep learning applications. In distributed deep learning, input data is divided and distributed across multiple machines for parallel processing. This approach offers several advantages, including the ability to handle very large data that can be stored in a distributed manner, scalability to decrease processing time by increasing the number of workers, and fault tolerance. The proposed workflow was evaluated on virtual machines and Azure Databricks, a cloud-based platform for big data analytics. The results indicated that the workflow can scale to large datasets and deliver high performance on Azure Databricks. This study explores the benefits and challenges of using distributed deep learning on big data clusters for instance segmentation. Popular distributed deep learning frameworks are discussed, and BigDL is chosen. Overall, this study highlights the practicality of distributed deep learning for deploying and scaling sophisticated deep learning models on big data clusters.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords