Smart Agricultural Technology (Dec 2023)
Estimating depth from RGB images using deep-learning for robotic applications in apple orchards
Abstract
Vision-enabled robotic approaches for apple orchard management have been widely studied in recent years. It is essential for the vision-system to capture the depth information of the canopies for improved understanding of the geometric relations between objects in the orchard environment, which is essential for safe and efficient operations of robots. Unfortunately, depth-enabled sensors are more expensive and less ubiquitous compared to standard RGB cameras, thus limiting the accessibility of depth cues. This study demonstrates that a data-driven approach using a conditional generative adversarial network (cGAN), known as Pix2Pix can estimate depth from RGB images of orchards acquired from a monocular camera. The Pix2Pix network was modified to generate a depth channel when a standard RGB image was given as input. The network was trained and tested for their efficacy using images acquired from two different apple cultivation systems and camera models. The results demonstrated that the model can generate depth estimates comparable to the actual depth channel with a root-mean-squared error (RMSE) of 1.83 cm (corresponding to a relative error of 3.5%). Moreover, a high structural similarity measure index (> 0.55) and commensurate textural features were observed between the actual depth image and the predicted depth image. The results showed that the use of the Pix2Pix model for producing rational depth maps of fruit orchards with monocular cameras is a viable alternative to the use of relatively more expensive RGB-D sensors for obtaining depth information.