To address the challenges of complex backgrounds, small defect sizes, and diverse defect types in defect detection of wire bonding X-ray images, this paper proposes a convolutional-neural-network-based defect detection method called YOLO-CSS. This method designs a novel feature extraction network that effectively captures semantic features from different gradient information. It utilizes a self-adaptive weighted multi-scale feature fusion module called SMA which adaptively weights the contribution of detection results based on different scales of feature maps. Simultaneously, skip connections are employed at the bottleneck of the network to ensure the integrity of feature information. Experimental results demonstrate that on the wire bonding X-ray defect image dataset, the proposed algorithm achieves mAP 0.5 and mAP 0.5–0.95 values of 97.3% and 72.1%, respectively, surpassing the YOLO series algorithms. It also exhibits certain advantages in terms of model size and detection speed, effectively balancing detection accuracy and speed.