IEEE Access (Jan 2024)
A Comparative Study of Vision Transformer and Convolutional Neural Network Models in Geological Fault Detection
Abstract
Geological fault detection is a critical aspect of geological exploitation and oil-gas exploration. The automation of fault detection can significantly reduce the dependence on expert labeling. Current prevailing methods often treat fault detection as a semantic segmentation task using the Convolutional Neural Network (CNN). However, CNNs emphasize on local feature extraction, making them susceptible to noise interference. In contrast, Vision Transformer (ViT) models, prioritizing global context extraction, have shown competitive performance. This paper explores the application of ViT models for fault detection and compares their performance against CNN models. We investigate six models, including two pure CNN models, two pure ViT models, and two hybrid CNN&ViT models, comparing three datasets (Thebe, FaultSeg3D, and Kerry3D). Our analysis underscores the resilience of pure ViT models to noise interference in real-world data. Additionally, it is noteworthy to highlight the advantage of CNN&ViT hybrid models in delineating low-grade faults. Furthermore, leveraging pre-trained ImageNet models, SwinUnet demonstrates remarkable data efficiency in fault prediction, requiring only about 100 pairs of 2D image patches and yielding results closely aligned with expert annotations. Our code is publicly available at: https://github.com/wangjing9999/Comparing-CNN-and-ViT-in-Geological-Fault-Detection.
Keywords