Xiangxiang Chu 2β , Richong Zhang 1β‘
Clone this repository
git clone git@github.com:lerogo/MMGenBench.git
cd MMGenBenchDownload dataset
huggingface-cli download --repo-type dataset lerogo/MMGenBench --local-dir MMGenBench-data
Install the relevant environment, including torch, transformers, diffusers and unicom (used to extract image representation).
We use the InternVL2-2B as an example. The structure of the code and data is as follows.
. βββ MMGenBench-data # The MMGenBench-Test/Domain dataset we downloaded from huggingface β βββ MMGenBench-Domain.json β βββ MMGenBench-Domain.tsv β βββ MMGenBench-Test-label-count.json β βββ MMGenBench-Test-label-index.json β βββ MMGenBench-Test.json β βββ MMGenBench-Test.tsv β βββ README.md β βββ check.py βββ README.md # This file βββ evalimg # For extracting features and calculating metrics using the image representation model β βββ metric_fid.py β βββ output β β βββ InternVL2-2B_MMGenBench-Domain.json β β βββ InternVL2-2B_MMGenBench-Test.json β βββ requirements.txt β βββ run.py β βββ run.sh βββ generate # For processing LMMs' output with the text-to-image models β βββ flux.py β βββ input β β βββ InternVL2-2B_MMGenBench-Domain.xlsx β β βββ InternVL2-2B_MMGenBench-Test.xlsx β βββ kolors.py β βββ lumina.py β βββ output β β βββ InternVL2-2B_MMGenBench-Domain.tsv β β βββ InternVL2-2B_MMGenBench-Test.tsv β βββ requirements.txt β βββ run.py β βββ run.sh β βββ sd.py β βββ tools.py βββ visual # For visualization βββ outputs β βββ InternVL2-2B_MMGenBench-Domain.json β βββ InternVL2-2B_MMGenBench-Domain.xlsx β βββ InternVL2-2B_MMGenBench-Test.json β βββ InternVL2-2B_MMGenBench-Test.xlsx βββ run.py βββ run.sh
Adapt your model in VLMEvalKit and use MMGenBench for inference.
Run command:
torchrun --nproc-per-node=4 run.py --model <YOUR LMM> --data MMGenBench-Test MMGenBench-Domain --mode infer --verbose
We use the InternVL2-2B as an example. Then you can get two files: InternVL2-2B_MMGenBench-Test.xlsx, InternVL2-2B_MMGenBench-Domain.xlsx. Put them in folder ./generate/input
Modify ./generate/run.sh to select the text-to-image model and to select the number of GPUs you need to use.
And run:
cd generate
bash run.shThen you can get two files: ./generate/output/InternVL2-2B_MMGenBench-Test.tsv, ./generate/output/InternVL2-2B_MMGenBench-Domain.tsv
We will use the unicom model to extract features from the original images and generated images, you need to install unicom (https://github.com/deepglint/unicom).
Modify ./evalimg/run.sh to evaluate the performance on MMGenBench-Test and MMGenBench-Domain respectively.
And run:
cd evalimg
bash run.shThen you can get two files: evalimg/output/InternVL2-2B_MMGenBench-Test.json, ./evalimg/output/InternVL2-2B_MMGenBench-Domain.json.
Run command:
cd visual
bash run.shYou can see the relevant results in the output folder, including metrics and visualization results.
If you have any questions, please submit an issue or contact lerogohl<AT>gmail.com.
If you find MMGenBench or code useful, please cite
@misc{huang2024MMGenBench, title={MMGenBench: Fully Automatically Evaluating LMMs from the Text-to-Image Generation Perspective}, author={Hailang Huang and Yong Wang and Zixuan Huang and Huaqiu Li and Tongwen Huang and Xiangxiang Chu and Richong Zhang}, year={2024}, eprint={2411.14062}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.14062}, }