Guangtongxin yanjiu (Oct 2024)

Research on Optical Network of Intelligent Computing Center based on Ethernet Lossless Networking

  • ZHAI Rui,
  • LI Zhuangzhi,
  • HOU Guangying,
  • MA Yijia,
  • XU Hualang

Abstract

Read online

【Objective】In recent years, Artificial Intelligence Generated Content(AIGC)has set off the artificial intelligence revolution. The network connection of the Intelligent Computing Center(ICC)has also developed in the direction of ultra-high bandwidth, intelligent lossless, and computing network convergence. Therefore, the optical network of the ICC needs to reduce the inter-card communication time in order to improve the efficiency of data access.【Methods】The paper addresses the networking architecture of optical networks for ICC scenarios to realize a lossless network with large bandwidth, low latency and high Central Processor Unit (CPU) efficiency, which can satisfy the demand of large model training and reasoning in ICC. This paper analyzes in detail the traffic distribution characteristics of the ICC and the communication flow characteristics under the AI large model training networking scenario. It also conducts in-depth research on the technologies such as Ethernet lossless network based on Remote Direct Memory Access(RDMA) technology and optoelectronic co-encapsulation. Finally it carries out the networking practice and latency test under the ICC scenario.【Results】The RDMA over Converged Ethernet(RoCE)-based transport scheme proposed in this paper has the capabilities of priority-based flow control, displaying congestion notification, enhanced transport selection and data center bridge capability switching protocols, which can realize lossless transmission based on Ethernet protocols in data centers. The test results in this paper show that the transmission delay using the RoCE protocol is approximately stable at around 1 μs and significantly outperforms the Internet Wide Area RDMA Protocol(iWARP).【Conclusion】In this paper, based on the traffic characterization in the intelligent computing scenario, we have studied the key characteristics of the lossless Ethernet network in the ICC, and used the RDMA technology to realize the enhancement of the transmission efficiency of the optical switching network in the scenario of the ICC. We have also put forward a lossless Ethernet network scheme under the large model inference scenario of the ICC, and explored the feasible direction for the application of the RDMA technology in the intelligent computing scenario. The proposed scheme explores a feasible direction for the application of RDMA technology in the smart computing scenario.

Keywords