Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China; Tsinghua–Berkeley Shenzhen Institute, Shenzhen 518071, China; Institute for Brain and Cognitive Science, Tsinghua University, Beijing 100084, China; Beijing Laboratory of Brain and Cognitive Intelligence, Beijing Municipal Education Commission, Beijing 100010, China
Mengqi Ji
Institute for Brain and Cognitive Science, Tsinghua University, Beijing 100084, China; Department of Automation, Tsinghua University, Beijing 100084, China; Beijing Laboratory of Brain and Cognitive Intelligence, Beijing Municipal Education Commission, Beijing 100010, China
Xiaoyun Yuan
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; Institute for Brain and Cognitive Science, Tsinghua University, Beijing 100084, China
Jing He
Institute for Brain and Cognitive Science, Tsinghua University, Beijing 100084, China; Department of Automation, Tsinghua University, Beijing 100084, China; Beijing Laboratory of Brain and Cognitive Intelligence, Beijing Municipal Education Commission, Beijing 100010, China
Jianing Zhang
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; Tsinghua–Berkeley Shenzhen Institute, Shenzhen 518071, China
Yinheng Zhu
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; Tsinghua–Berkeley Shenzhen Institute, Shenzhen 518071, China
Tian Zheng
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; Tsinghua–Berkeley Shenzhen Institute, Shenzhen 518071, China
Leyao Liu
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; Tsinghua–Berkeley Shenzhen Institute, Shenzhen 518071, China
Bin Wang
Hangzhou Hikvision Digital Technology Co., Ltd., Hangzhou 310012, China
Qionghai Dai
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China; Institute for Brain and Cognitive Science, Tsinghua University, Beijing 100084, China; Department of Automation, Tsinghua University, Beijing 100084, China; Beijing Laboratory of Brain and Cognitive Intelligence, Beijing Municipal Education Commission, Beijing 100010, China; Corresponding author.
Sensing and understanding large-scale dynamic scenes require a high-performance imaging system. Conventional imaging systems pursue higher capability by simply increasing the pixel resolution via stitching cameras at the expense of a bulky system. Moreover, they strictly follow the feedforward pathway: That is, their pixel-level sensing is independent of semantic understanding. Differently, a human visual system owns superiority with both feedforward and feedback pathways: The feedforward pathway extracts object representation (referred to as memory engram) from visual inputs, while, in the feedback pathway, the associated engram is reactivated to generate hypotheses about an object. Inspired by this, we propose a dual-pathway imaging mechanism, called engram-driven videography. We start by abstracting the holistic representation of the scene, which is associated bidirectionally with local details, driven by an instance-level engram. Technically, the entire system works by alternating between the excitation–inhibition and association states. In the former state, pixel-level details become dynamically consolidated or inhibited to strengthen the instance-level engram. In the association state, the spatially and temporally consistent content becomes synthesized driven by its engram for outstanding videography quality of future scenes. The association state serves as the imaging of future scenes by synthesizing spatially and temporally consistent content driven by its engram. Results of extensive simulations and experiments demonstrate that the proposed system revolutionizes the conventional videography paradigm and shows great potential for videography of large-scale scenes with multi-objects.