FAR: Fourier Aerial Video Recognition

FAR: Fourier Object Disentanglement (FO) empowers the network to intrinsically separate out the moving human agent from the background, without the need for any annotated object detection bounding boxes. This enables our network to explicitly focus on the low resolution human agent performing action, and not just learn from background cues. Space-Time Fourier Attention (FA) elegantly exploits the mathematical properties of the Fourier transform to imbibe the properties of self-attention and capture contextual knowledge and long-range space-time dependencies at a much lower computational complexity.

Paper Code
FAR GitHub Code
Abstract: We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition. Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent (which is typically small) from the background. Our disentanglement technique operates in the frequency domain to characterize the extent of temporal change of spatial pixels, and exploits convolution-multiplication properties of Fourier transform to map this representation to the corresponding object-background entangled features obtained from the network. To encapsulate contextual information and long-range space-time dependencies, we present a novel Fourier Attention algorithm, which emulates the benefits of self-attention by modeling the weighted outer product in the frequency domain. Our Fourier attention formulation uses much fewer computations than self-attention. We have evaluated our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night, Drone Action, and NEC Drone. We demonstrate a relative improvement of 8.02% - 38.69% in top-1 accuracy and up to 3 times faster over prior works.

Please cite our work if you found it useful,

  title={FAR: Fourier Aerial Video Recognition},
  author={Kothandaraman, Divya and Guan, Tianrui and Wang, Xijun and Hu, Sean and Lin, Ming and Manocha, Dinesh},
  journal={arXiv preprint arXiv:2203.10694},