IEEE Access (Jan 2024)
Geometry-Based Feature Selection and Deep Aggregation Model for Architectural Scenery Recomposition Toward Education
Abstract
In our study, we focus on the challenge of reworking the important visual elements of complex architectural scenes, which is crucial for AI-based education. Our goal is to blend different visual features from architectural images, which often have complex spatial designs, in a smooth and adaptable way. At the core of our method is a deep learning model designed to closely mimic how human eyes move and focus. To do this, we use a tool called the BING objectness metric, which helps us quickly and accurately pick out the most important parts of architectural images. These key parts, or patches, are identified by detecting objects or their parts at different sizes and within various architectural settings. We then combine the visual details from these patches using a method that brings together multiple perspectives. To better mimic how humans naturally focus on important areas in architectural scenes, we introduce a technique called locality-preserved learning (LRAL). This approach creates paths that mimic how a personąŕs gaze moves across a scene. LRAL is especially good at keeping the local details of architectural scenes intact while selecting the most representative patches that match where humans tend to look. As we apply the LRAL process, we create a gaze shift path (GSP) for each scene and calculate its key features using a deep aggregation model. These features are then fed into a Gaussian mixture model (GMM), which helps retarget the architectural scenes for educational purposes. Our approach has been tested through detailed analysis, demonstrating its effectiveness and offering clear benefits for displaying architectural content in educational settings.
Keywords