[画像:Rohit Girdhar]

Rohit Girdhar

Research Scientist

AMI Labs

I am a Research Scientist at AMI Labs. My current research focuses on multimodal understanding, generation and world modeling. I obtained a PhD from Carnegie Mellon University (here’s a link to my dissertation), where I worked on learning from and understanding videos. I was previously part of the Meta Superintelligence Labs and Facebook AI Research (FAIR) at Meta, and have spent time at DeepMind, Adobe and Facebook as an intern. See here for a formal bio.

News

Education
  • PhD in Robotics, 2019

    Carnegie Mellon University, Pittsburgh PA

  • MS in Robotics, 2016

    Carnegie Mellon University, Pittsburgh PA

  • B. Tech. in Computer Science, 2014

    IIIT Hyderabad, India

Experience
  • AMI Labs · Research Scientist

    New York · 2026 -- Present

  • Meta · Research Scientist

    New York · 2019 -- 2026

  • DeepMind · Research Scientist Intern

    London · Summer 2018

  • Facebook · Research Scientist Intern

    Menlo Park · Summer 2017

  • Adobe · Research Scientist Intern

    San Francisco · Summer 2016

  • Facebook · Software Engineering Intern

    Menlo Park · Summer 2013

Highlights

Videos powered by MovieGen and Emu Video!

Projects and Publications

.js-id-selected
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra
May, 2023 In CVPR, 2023 (Highlighted Presentation)
ImageBind: One Embedding Space To Bind Them All

One embedding space for 6 different modalities, enables zero-shot recognition on all modalities!

Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra
March, 2023 In ICCV, 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining

Scaling up MAE pre-pretraining, followed by weakly supervised pretraining, leads to strong representations.

Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, Kristen Grauman
January, 2023 In CVPR, 2023 (Highlighted Presentation)
HierVL: Learning Hierarchical Video-Language Embeddings

Video-language embeddings are a promising avenue for injecting semantics into visual representations, but existing methods capture only short-term associations between seconds-long video clips and their accompanying text. We propose HierVL, a novel hierarchical video-language embedding that simultaneously accounts for both long-term and short-term associations. As training data, we take videos accompanied by timestamped text descriptions of human actions, together with a high-level text summary of the activity throughout the long video (as are available in Ego4D). We introduce a hierarchical contrastive training objective that encourages text-visual alignment at both the clip level and video level. While the clip-level constraints use the step-by-step descriptions to capture what is happening in that instant, the video-level constraints use the summary text to capture why it is happening, i.e., the broader context for the activity and the intent of the actor. Our hierarchical scheme yields a clip representation that outperforms its single-level counterpart as well as a long-term video representation that achieves SotA results on tasks requiring long-term video modeling. HierVL successfully transfers to multiple challenging downstream tasks (in EPIC-KITCHENS-100, Charades-Ego, HowTo100M) in both zero-shot and fine-tuned settings.

Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens Van Der Maaten, Armand Joulin, Ishan Misra
June, 2022 In CVPR, 2022 (Oral Presentation)
Omnivore: A Single Model for Many Visual Modalities

A single model for images, video and single-view 3D.

Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing
December, 2021 In arXiv, 2021
Mask2Former for Video Instance Segmentation

SOTA video segmentation using Mask2Former.

Rohit Girdhar, Laura Gustafson, Aaron Adcock, Laurens Van Der Maaten
June, 2020 In ICML Workshops, 2021
Forward Prediction for Physical Reasoning

Forward prediction for PHYRE benchmark.

AltStyle によって変換されたページ (->オリジナル) /