Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

portfolio

publications

Quo Vadis, Video Understanding with Vision-Language Foundation Models?

Mahmoud Ali, Di Yang, Arkaprava Sinha, Dominick Reilly, Srijan Das, Gianpiero Francesca, Francois Bremond

NeurIPSW, 2024

[Paper]

This study benchmarks Vision-Language Models (VLMs & VLLMs) on five ADL video tasks across 11 datasets, revealing their struggles with fine-grained action understanding. Despite their web-scale success, these models fall short on real-world, densely labeled, and long-video challenges.

MS-Temba: Multi-Scale Temporal Mamba for Efficient Temporal Action Detection

Arkaprava Sinha, Monish Soundar Raj, Pu Wang, Ahmed Helmy, Srijan Das

preprint, 2025

[Paper] [Code]

Multi-scale Temporal Mamba adapts Mamba for action detection in long untrimmed videos by introducing Temporal Mamba (Temba) Blocks with dilated temporal modeling and a Temporal Mamba Fuser for multi-scale feature aggregation. It outperforms SOTA methods on long videos while being significantly more efficient.

talks

teaching