Embodied navigation systems perform well in clean lab conditions but fail dramatically in real-world scenarios with sensor noise and unclear instructions—this benchmark exposes those gaps and provides mitigation strategies.
NavTrust is a benchmark that tests how well navigation AI systems handle real-world problems like blurry images, sensor noise, and unclear instructions. The researchers tested seven state-of-the-art systems and found they all struggle significantly when inputs are corrupted, then demonstrated four strategies to make them more robust.