Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Visual Evaluation with Foundation Models

We are working towards a future that one foundation model can be a multi-purpose expert for low-level visual perception and visual evaluation.

👁️‍🗨️ Low-level Visual Perception in the Foundation Model Era

🔖Aiming at next-era cornerstone research

Low-level Visual Perception | Multi-Modality Large Language Models | Visual Quality Assessment

📖Main Projects

  • 4Co-Instruct: Homepage, Repo, Demo. Open-ended visual quality comparer (up to 4 images), low-level visual assistant, an improved version of 2Q-Instruct [CVPR 2024].

  • 3Q-Align [ICML 2024]: Homepage, Repo, Demo. A unified visual scorer for images and videos, via text-instructed alignment on multi-modality foundation models; can efficiently fine-tune to more datasets with stable good performance. State-of-the-art on IQA, VQA, and IAA.

  • 2Q-Instruct [CVPR 2024]: Homepage, Repo, 200K Dataset, Technical Report A large-scale instruction tuning dataset to improve low-level perceptual abilities of foundation models.

  • 1Q-Bench+ [ICLR2024, Spotlight]: Homepage, Repo, Data-Single, Data-Pair, Preprint The first low-level benchmark for foundation models on low-level vision.

🖋️Extension Projects

  • Q-Boost: Homepage A discussion on boosting the IQA performance for non-specially-IQA-aligned MLLMs.

  • [Pending]Chinese-Q-Bench/质衡: Homepage, Repo The first attempt to test multi-lingual abilities on low-level vision.

Maintained by Teo Wu@Singapore and Zicheng Zhang@Shanghai.

Pinned Loading

  1. Q-Align Q-Align Public

    3[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

    Python 515 29

  2. Q-Bench Q-Bench Public

    1[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

    Jupyter Notebook 275 13

  3. Q-Instruct Q-Instruct Public

    2[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.

    Python 230 10

  4. Co-Instruct Co-Instruct Public

    4[ECCV 2024 Oral, Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a benchmark.

    85 5

  5. A-Bench A-Bench Public

    [ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?

    139 4

  6. Q-Bench-Video Q-Bench-Video Public

    [CVPR 2025] A benchmark for video quality understanding of LMMs

    Python 108 1

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 16 repositories

Top languages

Loading...

Most used topics

Loading...

AltStyle によって変換されたページ (->オリジナル) /