Abstract
We present a novel autonomous robot navigation
algorithm for outdoor environments that is capable of handling
diverse terrain traversability conditions. Our approach, VLM-GroNav, uses vision-language models (VLMs) and integrates
them with physical grounding that is used to assess intrinsic terrain properties such as deformability and slipperiness.
We use proprioceptive-based sensing, which provides direct
measurements of these physical properties, and enhances the
overall semantic understanding of the terrains. Our formulation uses in-context learning to ground the VLM’s semantic
understanding with proprioceptive data to allow dynamic updates of traversability estimates based on the robot’s real-time
physical interactions with the environment. We use the updated
traversability estimations to inform both the local and global
planners for real-time trajectory replanning. We validate our
method on a legged robot (Ghost Vision 60) and a wheeled robot
(Clearpath Husky), in diverse real-world outdoor environments
with different deformable and slippery terrains. In practice, we
observe significant improvements over state-of-the-art methods
by up to 50% increase in navigation success rate.
Paper
VLM-GroNav: Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments.
Mohamed Elnoor, Kasun Weerakoon, Gershom Seneviratne, Ruiqi Xian, Tianrui Guan, Mohamed Khalid M Jaffar, Vignesh Rajagopal, and Dinesh Manocha