Improved yolov5 algorithm combined with depth camera and embedded system for blind indoor visual assistance

Kaikai Zhang; Yanyan Wang; Shengzhe Shi; Qingqing Wang; Chun Wang; Sheng Liu

doi:10.1038/s41598-024-74416-2

Scientific Reports (Oct 2024)

Improved yolov5 algorithm combined with depth camera and embedded system for blind indoor visual assistance

Kaikai Zhang,
Yanyan Wang,
Shengzhe Shi,
Qingqing Wang,
Chun Wang,
Sheng Liu

Affiliations

Kaikai Zhang: School of Computer Science and Technology, Huaibei Normal University
Yanyan Wang: School of Computer Science and Technology, Huaibei Normal University
Shengzhe Shi: School of Computer Science and Technology, Huaibei Normal University
Qingqing Wang: School of Computer Science and Technology, Huaibei Normal University
Chun Wang: School of Computer Science and Technology, Huaibei Normal University
Sheng Liu: School of Computer Science and Technology, Huaibei Normal University

DOI: https://doi.org/10.1038/s41598-024-74416-2
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract To assist the visually impaired in their daily lives and solve the problems associated with poor portability, high hardware costs, and environmental susceptibility of indoor object-finding aids for the visually impaired, an improved YOLOv5 algorithm was proposed. It was combined with a RealSense D435i depth camera and a voice system to realise an indoor object-finding device for the visually impaired using a Raspberry Pi 4 B device as its core. The algorithm uses GhostNet instead of the YOLOv5s backbone network to reduce the number of parameters and computation of the model, incorporates an attention mechanism (coordinate attention), and replaces the YOLOv5 neck network with a bidirectional feature pyramid network to enhance feature extraction. Compared to the YOLOv5 model, the model size was reduced by 42.4%, number of parameters was reduced by 47.9%, and recall rate increased by 1.2% with the same precision. This study applied the improved YOLOv5 algorithm to an indoor object-finding device for the visually impaired, where the searched object was input by voice, and the RealSense D435i was used to acquire RGB and depth images to realize the detection and ranging of the object, broadcast the specific distance of the target object by voice, and assist the visually impaired in finding the object.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords