In this article, we implement fast and power-efficient training hardware for convolutional neural networks (CNNs) based on CMOS invertible logic. The backpropagation algorithm is generally hard to implement in hardware because it requires high-precision floating-point arithmetic. Even though parameters of CNNs can be represented by fixed points or even binary during inference, it is still represented by floating points during training. Our hardware uses low-precision data representation for both inference and training. For hardware implementation, we exploit CMOS invertible logic for training. The use of invertible logic enables logic circuits to compute probabilistic bidirectional operation (forward and backward modes) and can be implemented by stochastic computing. The proposed hardware obtains parameters of neural networks such as weights directly from given data (an input feature map and a true label) without backpropagation. For performance evaluation, the proposed hardware is implemented on an FPGA and trains a binarized 2-layer convolutional neural network model using a modified MNIST dataset. This implementation shows an energy efficiency improvement of approximately 134x compared to that of a CPU implementation that executes the training of the same model as that used in the proposed hardware. Training on the proposed hardware is approximately 40x faster than training on the CPU using the backpropagation algorithm while maintaining almost the same cognition accuracy.