IEEE Access (Jan 2024)
A Novel Scheme for Managing Multiple Context Transitions While Ensuring Consistency in Text-to-Image Generative Artificial Intelligence
Abstract
Humans possess an astonishing ability to understand stories presented in text and to create related images through imagination. This cognitive ability aids in comprehension and enhances overall enjoyment. Consequently, developing automated systems that generate visually faithful images based on textual descriptions is considered a meaningful endeavor. As a result, many artificial intelligence (AI) systems for generating images from text have been developed. In previous research, we presented a study on generating images while maintaining the context of input sentences when multiple sentences are input. In this paper, we propose that when dealing with more structured and numerous sentences, such as those found in novels, essays, or papers, it is essential not only to maintain the consistency of the context but also to address the complex challenge of transitioning between different contexts, which cannot be resolved by merely dividing sentences into paragraphs. We introduce the Structured Context Retention Methods (SCRM) scheme, which reflects the user’s intentions for both context retention and smooth transitions across varying narrative elements. Additionally, through experiments, we demonstrate that the SCRM technique performs well in terms of ROUGE recall, effectively managing a large number of input sentences and context transitions.
Keywords