Publications

You can also find my articles on my Google Scholar profile.

Selected Publications


UniEgo: Proxies as Mediators for Unified Egocentric Video Representation Learning

Wenhao Chi, Arkaprava Sinha, Dominick Reilly, Hieu Le, Srijan Das

Preprint, 2026

[Paper]

UNIEGO shows that egocentric video models need not be limited by what a wearable camera directly observes: through proxy-mediated distillation, a single ego-only encoder can absorb complementary knowledge from exocentric views, depth, skeletons, and foundation models. By converting heterogeneous teachers into a unified proxy space and selectively distilling only reliable supervision, UNIEGO delivers stronger representations across action recognition, retrieval, and temporal segmentation while requiring only egocentric RGB video at inference.

MS-Temba: Multi-Scale Temporal Mamba for Efficient Temporal Action Detection

Arkaprava Sinha, Monish Soundar Raj, Pu Wang, Ahmed Helmy, Hieu Le, Srijan Das

CVPR, 2026

[Paper] [Code]

MS-Temba adapts Mamba-based state-space modeling to Temporal Action Detection by introducing dilated multi-scale SSMs that capture both fine-grained and long-range temporal dynamics in long, untrimmed videos. Through scale-aware supervision and a dedicated multi-scale fusion module, it delivers precise localization of densely overlapping actions and generalizes effectively to long-form Video Summarization.

Quo Vadis, Video Understanding with Vision-Language Foundation Models?

Mahmoud Ali, Di Yang, Arkaprava Sinha, Dominick Reilly, Srijan Das, Gianpiero Francesca, Francois Bremond

NeurIPSW, 2024

[Paper]

This study benchmarks Vision-Language Models (VLMs & VLLMs) on five ADL video tasks across 11 datasets, revealing their struggles with fine-grained action understanding. Despite their web-scale success, these models fall short on real-world, densely labeled, and long-video challenges.