DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features.


Paper Code Dataset
DensePeds, IROS’19 Coming Soon India-Walk (More details below).

We present a pedestrian tracking algorithm, DensePeds, that tracks individuals in highly dense crowds (greater than 2 pedestrians per square meter). Our approach is designed for videos captured from front-facing or elevated cameras. We present a new motion model called Front-RVO (FRVO) for predicting pedestrian movements in dense situations using collision avoidance constraints and combine it with state-of-the-art Mask R-CNN to compute sparse feature vectors that reduce the loss of pedestrian tracks (false negatives). We evaluate DensePeds on the standard MOT benchmarks as well as a new dense crowd dataset. In practice, our approach is 4.5 times faster than prior tracking algorithms on the MOT benchmark and we are state-of-the-art in dense crowd videos by over 2.6% on the absolute scale on average.

Please cite our work if you found it useful,

@misc{ch2019densepeds,
    title={DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features},
    author={Rohan Chandra and Uttaran Bhattacharya and Aniket Bera and Dinesh Manocha},
    year={2019},
    eprint={1906.10313},
    archivePrefix={arXiv},
    primaryClass={cs.RO}
}

Dataset License

Available for download here.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The labels corresponding to each video inside the folder videos_original is stored in the file annotations/*_gt.txt.

Each line in each *_gt.txt has the following information in the specified order:

<frame number>,<#agents in frame>,<bbox top left x>,<bbox top left y>,<bbox bottom right x>,<bbox bottom right y>,<agent1 ID>,...,<bbox top left x>,<bbox top left y>,<bbox bottom right x>,<bbox bottom right y>,<agentN ID>
  • bbox: bounding box of the agent.
  • All x and y values are in pixels from top left of the corresponding image frame.
  • N = number of agents in the frame.
  • Agents belong to one of the following classes: ped, cycle, scooter, bike, rick, car, bus, truck, others. Agent ID is assigned according to: . For example, the first tracked pedestrian in a video has the ID ped0, the 5th tracked rickshaw has the ID rick4 etc.
  • number of lines in each file = number of frames in the corresponding video.
  • The dataset contains 8 annotated dense pedestrian videos so far.
  • Each video is processed at 20 fps, and the videos roughly range between between 200 and 700 frames.
  • The videos are extremely dense, with roughly 70 to 80 pedestrains per frame on average.