Abstract
Reinforcement learning has gained significant
traction in the field of robotic navigation. However, a persistent
challenge is its sample inefficiency, primarily due to the inherent
complexities of encouraging exploration. During training, the
mobile agent must explore as much as possible to efficiently
learn optimal behaviors. We introduce Ada-NAV, a novel
adaptive trajectory length scheme designed to enhance the
training sample efficiency of reinforcement learning algorithms
in robotic navigation tasks. Unlike traditional approaches that
treat trajectory length as a fixed hyperparameter, Ada-NAV
dynamically adjusts it based on the entropy of the underlying
navigation policy. We empirically validate the efficacy of AdaNAV using two popular policy gradient methods: REINFORCE
and Proximal Policy Optimization (PPO). We demonstrate
through both simulated and real-world robotic experiments
that Ada-NAV outperforms conventional methods that employ
constant or randomly sampled trajectory lengths. Specifically,
for a fixed sample budget, Ada-NAV achieves an 18% increase in
navigation success rate, a 20-38% reduction in navigation path
length, and a 9.32% decrease in elevation costs. Furthermore,
we showcase the versatility of Ada-NAV by integrating it with
the Clearpath Husky robot, illustrating its applicability in
complex, outdoor environments.
Paper
HTRON: Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm.
Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha