IEEE Access (Jan 2025)
Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator Generation
Abstract
With advancements in cross-modal techniques, methods for generating images and videos from text or speech have become increasingly practical. However, research on video generation from modalities other than text or speech remains limited. One major reason for this shortage is the lack of large-scale datasets, which leads to skewed data distributions and causes issues such as unresponsiveness to untrained patterns, image collapse, and scene skipping. To address these challenges, this study focuses on a driving simulation generation model, which produces driving scenarios from speed and steering angle information. We propose a synthetic dataset for speed + angular velocity to image (SAV2IMG) generation. Specifically, we create a half-synthetic dataset, in which only the query (control inputs) is synthetic and the response (images) is derived from real data, as well as a fully synthetic dataset, in which both the query and response are synthetic. By doing so, we construct a training environment that enables the model to handle diverse driving patterns and previously unseen control conditions. We conducted experiments comparing three training conditions for SAV2IMG: using real data only, using real plus half-synthetic data, and using real plus both half- and fully synthetic data. The results demonstrate improved image quality as measured by FID, enhanced controllability, and flexible adaptability to unknown control conditions. Moreover, employing fully synthetic data generated from 3D city models allowed for stable responses to unfamiliar scenarios. At the same time, we found that simple physical models failed to fully reproduce complex control patterns. These findings are not only valuable for improving SAV2IMG but also hold broader implications for vehicle-related generative tasks and cross-modal generation models in general. They provide a meaningful foundation for future model development and data augmentation strategies.
Keywords