Readme
Moondream2
moondream2 is a small vision language model designed to run efficiently on edge devices. Check out the GitHub repository for details, or try it out on the Hugging Face Space!
Benchmarks
| Release | VQAv2 | GQA | TextVQA | DocVQA | TallyQA (simple/full) |
POPE (rand/pop/adv) |
|---|---|---|---|---|---|---|
| 2024年07月23日 (latest) | 79.4 | 64.9 | 60.2 | 61.9 | 82.0 / 76.8 | 91.3 / 89.7 / 86.9 |
| 2024年05月20日 | 79.4 | 63.1 | 57.2 | 30.5 | 82.1 / 76.6 | 91.5 / 89.6 / 86.2 |
| 2024年05月08日 | 79.0 | 62.7 | 53.1 | 30.5 | 81.6 / 76.1 | 90.6 / 88.3 / 85.0 |
| 2024年04月02日 | 77.7 | 61.7 | 49.7 | 24.3 | 80.1 / 74.2 | - |
| 2024年03月13日 | 76.8 | 60.6 | 46.4 | 22.2 | 79.6 / 73.3 | - |
| 2024年03月06日 | 75.4 | 59.8 | 43.1 | 20.9 | 79.5 / 73.2 | - |
| 2024年03月04日 | 74.2 | 58.5 | 36.4 | - | - | - |
Usage
The model is updated regularly, so we recommend pinning the model version to a specific release as shown above.
Model created