Home

Hello! I am Arkaprava Sinha, a Graduate Research Assistant pursuing a Ph.D. in Computer Science at the University of North Carolina, Charlotte. I am advised by Prof. Srijan Das. My research lies in the intersection of Multimodal Vision Language Models, Long Context Video Understanding, Temporal Modeling and Agentic Systems with a focus on building scalable and reliable algorithms for Long Video Understanding.

Prior to my Ph.D., I worked as a Data Scientist, where I contributed to projects in Computer Vision, Natural Language Processing, and large-scale Machine Learning systems across industry and research settings.

Research

My research focuses on building efficient multimodal AI systems for long-horizon video understanding, egocentric perception, and visual reasoning. I design scalable temporal architectures and representation learning methods that help models reason over long, untrimmed videos, align video with language, motion, and structured visual cues, and operate efficiently in real-world settings.

I am particularly interested in agentic video understanding, where systems actively search for relevant evidence, reason over temporal context, and verify answers using multimodal foundation models instead of exhaustively processing entire videos. Broadly, my work connects long video understanding, Vision-Language Models, Multimodal LLMs, diffusion-based generation, and embodied AI, with applications in robotics, AR/VR, intelligent assistants, assistive technology, and safety-critical video analytics.

News

Feb 2026 - MS-Temba accepted to CVPR 2026.
Feb 2025 - LLAVIDAL accepted to CVPR 2025.
Dec 2024 - SKI Models accepted to AAAI 2025.
Oct 2024 - 2 papers accepted to NeurIPS 2024 workshops. Early version of LLAVIDAL is presented in NeurIPS 2024 workshop on Video-Language Models and Multimodal Algorithmic Reasoning.

Selected Publications

  • TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living
    Preprint, 2026
    Arkaprava Sinha, Dominick Reilly, Siddharth Krishnan, Hieu Le, Srijan Das
    Paper | Website

  • UniEgo: Proxies as Mediators for Unified Egocentric Video Representation Learning
    Preprint, 2026
    Wenhao Chi, Arkaprava Sinha, Dominick Reilly, Hieu Le, Srijan Das
    Paper

  • MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos
    The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
    Arkaprava Sinha, Monish Soundar Raj, Pu Wang, Ahmed Helmy, Hieu Le, Srijan Das
    Paper | Code | Website

  • LLAVIDAL: A Large Language Vision Model for Daily Activities of Living
    The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
    Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha, Manish Kumar Govind, Pu Wang, Francois Bremond, Le Xue, Srijan Das
    Paper | Code | Website

  • SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living
    The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025
    Arkaprava Sinha, Dominick Reilly, Francois Bremond, Pu Wang, Srijan Das
    Paper | Code

  • DiffSwap++: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping
    Preprint
    Weston Bondurant, Arkaprava Sinha, Hieu Le, Srijan Das, Stephanie Schuckers
    Paper

  • Quo Vadis, Video Understanding with Vision-Language Foundation Models?
    NeurIPS Workshop on Video-Language Models, 2024
    Mahmoud Ali, Di Yang, Arkaprava Sinha, Dominick Reilly, Srijan Das, Gianpiero Francesca, Francois Bremond
    Paper