Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

OpenGVLab

General Vision Team of Shanghai AI Laboratory

Static Badge Twitter

Welcome to OpenGVLab! 👋

We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.

We develop model architecture and release pre-trained foundation models to the community to motivate further research in this area. We have made promising progress in general vision AI, with 109 SOTA🚀. In 2022, our open-sourced foundation model 65.5 mAP on the COCO object detection benchmark, 91.1% Top1 accuracy in Kinetics 400, achieved landmarks for AI vision👀 tasks for image🖼️ and video📹 understanding. In 2023, we created VideoChat🦜,llama-adapter🦙, 3D foundation model Ponder V2🧊 and many more wonderful works! In CVPR 2023, our vision foundation model InternImage was listed as one of the most influential papers, and by benefiting from our partner OpenDriveLab, we won the Best paper together🎉 .

In 2024, we released the best open-source VLM InternVL , video understanding foundation model InternVideo2, which won 7 Champions on EgoVis challenges 🥇. Up to now, our brilliant team have open-sourced more than 70 works, please find them here😃

Based on solid vision foundations, we have expanded to Multi-Modality models and. We aim to empower individuals and businesses by offering a higher starting point for developing vision-based AI products and lessening the burden of building an AI model from scratch.

Branches: Alpha (explore lattest advances in vision+language research), uni-medical (focus on medical AI), Vchitect (Generative AI)

Follow us: Twitter X logo Twitter 🤗Hugging Face Medium logo Medium WeChat logo WeChat zhihu logo Zhihu

Pinned Loading

  1. InternVL InternVL Public

    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

    Python 9.3k 727

  2. InternVideo InternVideo Public

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

    Python 2.1k 128

  3. Ask-Anything Ask-Anything Public

    [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

    Python 3.3k 267

  4. VideoMamba VideoMamba Public

    [ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

    Python 1k 81

  5. OmniQuant OmniQuant Public

    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

    Python 859 70

  6. LLaMA-Adapter LLaMA-Adapter Public

    [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

    Python 5.9k 383

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 89 repositories
  • SDLM Public

    Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-cache compatibility, achieving high efficiency and throughput.

    OpenGVLab/SDLM’s past year of commit activity
    Python 59 MIT 1 0 0 Updated Oct 17, 2025
  • VideoChat-R1 Public

    [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning

    OpenGVLab/VideoChat-R1’s past year of commit activity
    Python 212 8 17 0 Updated Oct 17, 2025
  • MetaCaptioner Public
    OpenGVLab/MetaCaptioner’s past year of commit activity
    Python 4 0 0 0 Updated Oct 15, 2025
  • ExpVid Public
    OpenGVLab/ExpVid’s past year of commit activity
    5 0 0 0 Updated Oct 15, 2025
  • Vlaser Public

    Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

    OpenGVLab/Vlaser’s past year of commit activity
    Python 17 MIT 0 1 0 Updated Oct 13, 2025
  • NaViL Public
    OpenGVLab/NaViL’s past year of commit activity
    Python 80 MIT 7 0 0 Updated Oct 10, 2025
  • GenExam Public

    GenExam: A Multidisciplinary Text-to-Image Exam

    OpenGVLab/GenExam’s past year of commit activity
    Python 39 MIT 3 0 0 Updated Oct 6, 2025
  • ScaleCUA Public

    ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).

    OpenGVLab/ScaleCUA’s past year of commit activity
    Python 661 Apache-2.0 37 4 0 Updated Oct 3, 2025
  • SID-VLN Public

    Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

    OpenGVLab/SID-VLN’s past year of commit activity
    Python 7 MIT 1 0 0 Updated Oct 2, 2025
  • PonderV2 Public

    [T-PAMI 2025] PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

    OpenGVLab/PonderV2’s past year of commit activity
    Python 361 MIT 9 1 0 Updated Sep 30, 2025

AltStyle によって変換されたページ (->オリジナル) /