Abstract

In recent years, deep learning has been applied to the problem of inferring depth from a single image. Despite numerous practical robotic applications in outdoor field environments such as agricultural or forested regions, relatively little research into deep depth estimation in these natural scenes has been undertaken. This paper investigates the capability of state of the art monocular depth estimation networks to perform in natural, field environments rather than the structured, urban scenes that are currently in vogue. To further understand how depth estimation networks react in novel outdoor environments, network interpretation is employed to identify the specific monocular cues and central challenges involved in this context. The results show that certain depth network architectures are inherently more suitable for the challenges involved in unstructured environments. The core challenges which differenti- ate natural environments from traditional structured scenes are found to centre on changing environmental conditions and a relative paucity of depth cues. From the perspective of network architecture, it is demonstrated that a reliance on global geometric features is detrimental in natural environments due to the high variability in geometry. Overall, these results suggest that current monocular depth estimation networks require significant adaptation to provide the accuracy necessary for many real world applications. A deeper understanding of the monocular cues used by depth networks has been attained through network interpretation, thus progressing understanding of how depth estimation networks react to different environments. This study thus provides a springboard for further investigation into the ability to adapt existing networks for novel outdoor environments. This is vital for practical field robotic applications to benefit from current advances in depth estimation networks.