IEEE Access (Jan 2023)

Arbitrary Style Transfer With Fused Convolutional Block Attention Modules

  • Haitao Xin,
  • Li Li

DOI
https://doi.org/10.1109/ACCESS.2023.3273949
Journal volume & issue
Vol. 11
pp. 44977 – 44988

Abstract

Read online

The advancement of deep learning has rendered image style transfer a progressively intricate subject matter. The proposed solution aims to tackle the limitations of current methods in retaining the content image object contours and avoiding blurred image boundaries and mismatched color matching after stylization. To achieve this, an arbitrary-style transfer network is introduced, which leverages the attention mechanism. The network comprises an encoder-decoder module, a convolutional block attention module (CBAM), and an adaptive attention normalization network (AdaAttN) module. The CBAM attention mechanism is presented as an extension of the AdaAttN network, with the aim of producing stylized images that exhibit both global and local style coordination. This is achieved by leveraging long-range dependencies in the image. Additionally, a novel loss function, referred to as the structural similarity loss, is proposed to enhance the consistency of the generated images with respect to the underlying content structure. Finally, a new local feature loss is introduced to further enhance the visual quality of the stylized images at a local level. The study involved conducting style transfer training on a dataset comprising 82,783 real images and 81,446 artistic images. Furthermore, an additional set of 1,000 resultant images, generated from 100 real photos and 10 artistic portraits, was utilized for testing purposes. The study compares the experimental outcomes with four contemporary-style transfer techniques. Additionally, the efficacy of the CBAM module and SSIM loss function is demonstrated through ablation experiments. The findings of the experiment demonstrate that the network proposed has the ability to effectively adapt to the local style and can adeptly correspond the semantically proximate style features to the content features, thereby preserving superior spatial consistency.

Keywords