BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in Unstructured Driving Environments


We present an unsupervised adaptation approach for visual scene understanding in unstructured traffic environments. Our method is designed for unstructured real-world scenarios with dense and heterogeneous traffic consisting of cars, trucks, two-and three-wheelers, and pedestrians. We describe a new semantic segmentation technique based on unsupervised domain adaptation (DA), that can identify the class or category of each region in RGB images or videos. We also present a novel self-training algorithm for multi-source DA that improves the accuracy. Our overall approach is a deep learning-based technique and consists of an unsupervised neural network that achieves 87.18% accuracy on the challenging India Driving Dataset. Our method works well on roads that may not be well-marked or may include dirt, unidentifiable debris, potholes, etc. A key aspect of our approach is that it can also identify objects that are encountered by the model for the fist time during the testing phase. We compare our method against the state-of-the art methods and show an improvement of 5.17% - 42.9%. Furthermore, we also conduct user studies that qualitatively validate the improvements in visual scene understanding of unstructured driving environments.

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021