Abstract
In recent years, deep learning has been applied to the problem of inferring depth from a
single image. Despite numerous practical robotic applications in outdoor field environments
such as agricultural or forested regions, relatively little research into deep depth estimation
in these natural scenes has been undertaken. This paper investigates the capability of state
of the art monocular depth estimation networks to perform in natural, field environments
rather than the structured, urban scenes that are currently in vogue. To further understand
how depth estimation networks react in novel outdoor environments, network interpretation
is employed to identify the specific monocular cues and central challenges involved in this
context.
The results show that certain depth network architectures are inherently more suitable for
the challenges involved in unstructured environments. The core challenges which differenti-
ate natural environments from traditional structured scenes are found to centre on changing
environmental conditions and a relative paucity of depth cues. From the perspective of
network architecture, it is demonstrated that a reliance on global geometric features is
detrimental in natural environments due to the high variability in geometry.
Overall, these results suggest that current monocular depth estimation networks require
significant adaptation to provide the accuracy necessary for many real world applications.
A deeper understanding of the monocular cues used by depth networks has been attained
through network interpretation, thus progressing understanding of how depth estimation
networks react to different environments. This study thus provides a springboard for further
investigation into the ability to adapt existing networks for novel outdoor environments.
This is vital for practical field robotic applications to benefit from current advances in
depth estimation networks.