MS-Temba: Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha, Monish Soundar Raj, Pu Wang, Ahmed Helmy, Hieu Le, Srijan Das
CVPR, 2026
MS-Temba adapts Mamba-based state-space modeling to Temporal Action Detection by introducing dilated multi-scale SSMs that capture both fine-grained and long-range temporal dynamics in long, untrimmed videos. Through scale-aware supervision and a dedicated multi-scale fusion module, it delivers precise localization of densely overlapping actions and generalizes effectively to long-form Video Summarization.
