Multimodal deep learning using on-chip diffractive optics with in situ training capability

Junwei Cheng; Chaoran Huang; Jialong Zhang; Bo Wu; Wenkai Zhang; Xinyu Liu; Jiahui Zhang; Yiyi Tang; Hailong Zhou; Qiming Zhang; Min Gu; Jianji Dong; Xinliang Zhang

doi:10.1038/s41467-024-50677-3

Nature Communications (Jul 2024)

Multimodal deep learning using on-chip diffractive optics with in situ training capability

Junwei Cheng,
Chaoran Huang,
Jialong Zhang,
Bo Wu,
Wenkai Zhang,
Xinyu Liu,
Jiahui Zhang,
Yiyi Tang,
Hailong Zhou,
Qiming Zhang,
Min Gu,
Jianji Dong,
Xinliang Zhang

Affiliations

Junwei Cheng: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Chaoran Huang: Department of Electronic Engineering, The Chinese University of Hong Kong
Jialong Zhang: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Bo Wu: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Wenkai Zhang: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Xinyu Liu: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Jiahui Zhang: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Yiyi Tang: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Hailong Zhou: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Qiming Zhang: Institute of Photonic Chips, University of Shanghai for Science and Technology
Min Gu: Institute of Photonic Chips, University of Shanghai for Science and Technology
Jianji Dong: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Xinliang Zhang: Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology

DOI: https://doi.org/10.1038/s41467-024-50677-3
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Multimodal deep learning plays a pivotal role in supporting the processing and learning of diverse data types within the realm of artificial intelligence generated content (AIGC). However, most photonic neuromorphic processors for deep learning can only handle a single data modality (either vision or audio) due to the lack of abundant parameter training in optical domain. Here, we propose and demonstrate a trainable diffractive optical neural network (TDONN) chip based on on-chip diffractive optics with massive tunable elements to address these constraints. The TDONN chip includes one input layer, five hidden layers, and one output layer, and only one forward propagation is required to obtain the inference results without frequent optical-electrical conversion. The customized stochastic gradient descent algorithm and the drop-out mechanism are developed for photonic neurons to realize in situ training and fast convergence in the optical domain. The TDONN chip achieves a potential throughput of 217.6 tera-operations per second (TOPS) with high computing density (447.7 TOPS/mm2), high system-level energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip has successfully implemented four-class classification in different modalities (vision, audio, and touch) and achieve 85.7% accuracy on multimodal test sets. Our work opens up a new avenue for multimodal deep learning with integrated photonic processors, providing a potential solution for low-power AI large models using photonic technology.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal