IEEE Access (Jan 2024)
Text2Layout: Layout Generation From Text Representation Using Transformer
Abstract
Recent advanced Text-to-Image methods still require much work to specify all labels and detailed layouts of objects to obtain an accurate planned image. Layout-based synthesis is an alternative method enabling users to control detailed composition directly to avoid the trial-and-error often required in prompt-based editing. This paper proposes a layout generation from text instead of generating images directly. Our approach uses Transformer-based deep neural networks to synthesize scene representations of multiple objects. By focusing on layout information, we can make an explainable layout of what objects the image includes. Our end-to-end approach uses parallel decoding, differs from conventional layout synthesis, and introduces sequential object predictions and post-processing of duplicate bounding boxes. We experimentally compare our method’s quality and computational cost against existing ones, demonstrating its effectiveness and efficiency in generating layouts from textual representations. Combined with Layout-to-Image, this approach has significant practical implications, allowing the practical authoring tools that make image generation explainable and computable using relatively lightweight networks.
Keywords