IEEE Access (Jan 2021)
Convolutional Neural Network-Based Visual Servoing for Eye-to-Hand Manipulator
Abstract
We propose a CNN based visual servoing scheme for precise positioning of an eye-to-hand manipulator in which the control input of a robot is calculated directly from images by a neural network. In this paper, we propose Difference of Encoded Features driven Interaction matrix Network (DEFINet), a new convolutional neural network (CNN), for eye-to-hand visual servoing. DEFINet estimates a relative pose between desired and current end-effector from desired and current images captured by an eye-to-hand camera. DEFINet includes two branches of the same CNN that share weights and encode target and current images, which is inspired by the architecture of Siamese network. Regression of the relative pose from the difference of the encoded target and current image features leads to a high positioning accuracy of visual servoing using DEFINet. The training dataset is generated from sample data collected by operating a manipulator randomly in task space. The performance of the proposed visual servoing is evaluated through numerical simulation and experiments using a six-DOF industrial manipulator in a real environment. Both simulation and experimental results show the effectiveness of the proposed method.
Keywords