IEEE Access (Jan 2024)
An Intelligent Retrieval Method for Audio and Video Content: Deep Learning Technology Based on Artificial Intelligence
Abstract
To address the challenges of efficient intelligent retrieval and cross-modal analysis brought by the surge in audio-video data, this study proposes an intelligent retrieval method for audio-video content based on deep learning techniques, aimed at improving retrieval efficiency and accuracy. This method extracts audio features using the Visual Geometry Group Network (VGG) and employs an adaptive clustering keyframe extraction algorithm (SKM) to extract video features. By integrating cross-learning within an embedding network, it enhances retrieval efficiency and accuracy. The test results on the CMU-MOSEI dataset demonstrate that our method outperforms traditional models such as Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), and state-of-the-art deep learning models like Deep Canonical Correlation Analysis (DCCA) and Domain-Adversarial Neural Network (DANN) in multimodal data processing and real-world retrieval tasks. In video processing, the average fidelity is 0.693, and the average compression ratio is 0.936, representing improvements of 30.75% and 7.09%, respectively, compared to traditional methods. Through the application of deep learning technology, this study not only optimizes the processing of single modalities but also enhances the handling of cross-modal data through a cross-learning framework.
Keywords