IEEE Access (Jan 2024)
Research and Implementation of Constrained Keyframe-Based Virtual Human Video Generation
Abstract
This article proposes a virtual human video generation and interaction method that constrains the starting keyframe. The core of this method is to solve the problems of non interactivity and poor stability of cross mixed frames in generative videos through a multimodal information fusion strategy and interaction framework. Firstly, model the image using a 3D facial model; Then, the Transformer structure is used to generate audio driven virtual human videos, and keyframe constraints are utilized to generate videos with consistent starting frames; Finally, virtual human interaction is achieved through an interactive framework.The characteristic of this method is that the cross mixing frame has good stability and avoids the direct design of the virtual human interaction kernel, effectively solving the problems of non interactivity, long production cycle, and high cost in generative virtual human videos. The model performs well in metrics such as LSE-C, Beat Align, CPBD, and SSIM, with SSIM reaching 0.916, demonstrating excellent generation quality and interaction efficiency.
Keywords