Publications

You can also find my articles on my Google Scholar profile.

Conference Papers


CrowdSurfer: Sampling Optimzation Augmented with Vector-Quantized Variational AutoEncoder for Dense Crowd Navigation

Accepted at IEEE ICRA 2025, 2025

Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to generate sophisticated long-horizon local plans at interactive rates. More specifically, we train a Vector Quantized Variational AutoEncoder to learn a prior over the expert trajectory distribution conditioned on the perception input. At run-time, this is used as an initialization for a sampling-based optimizer for further refinement. Our approach does not require any sophisticated prediction of dynamic obstacles and yet provides state-of-the- art performance. In particular, we compare against the recent DRL-VO approach and show a 40% improvement in success rate and a 6% improvement in travel time.

Recommended citation: Naman Kumar*, Antareep Singha*, Laksh Nanwani*, Dhruv Potdar, Tarun R, Fatemeh Rastgar, Simon Idoko, Arun Kumar Singh, K. Madhava Krishna.
Download Paper | Project Website

An FPGA based Real-Time Video Processing system on Zynq 7010

Published at IEEE ICACIC, 2023

Real-Time Image Processing involves the transformation of incoming signals, primarily from a camera, into a format that can be readily interpreted by a display device. This process is heavily reliant on precise timing constraints, demanding efficient hardware execution. This paper proposes an innovative method for interfacing the OV7670 Complementary Metal Oxide Semiconductor (CMOS) Camera with an FPGA-based Real-Time Image Processing system on a Zynq 7010 platform, using the open-source Digilent Dynamic Clock Generator. The architecture is characterized by it’s parallel processing capability of both controlling the camera output signals and processing the signals and converting them from RGB to DVI format on the fly. In lieu of the traditional PLL based clocking wizard, which provides a fixed clock signal, the open-source Dynamic Clock Generator has been incorporated in the architecture to generate the essential pixel clock, meeting the real-time clocking requirements. The RGB to DVI(Digital Visual Interface) block has been coded in VHDL to convert the output from AXI4-Stream to Video Out Xilinx IP Core to TMDS (Transition-Minimized Differential Signaling data, to be interpreted by an HDMI compatible monitor.

Recommended citation: Antareep Singha.
Download Paper | Project Website