lvlm
Here are 26 public repositories matching this topic...
Language: All
Sort: Most stars
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
-
Updated
Apr 4, 2025 - HTML
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
-
Updated
Jun 1, 2025 - Jupyter Notebook
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
-
Updated
Oct 3, 2025
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
-
Updated
Sep 26, 2024 - Python
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
-
Updated
Oct 1, 2025 - Python
📜 Paper list on decoding methods for LLMs and LVLMs
-
Updated
Jun 30, 2025
[ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets
-
Updated
Aug 6, 2025
CLIP-MoE: Mixture of Experts for CLIP
-
Updated
Oct 10, 2024 - Python
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
-
Updated
Apr 18, 2025 - Python
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
-
Updated
Oct 24, 2025
The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".
-
Updated
Aug 20, 2025
LEMMA: An effective and explainable way to detect multimodal misinformation with LVLM and external knowledge augmentation, incorporating the intuition and reasoning capbility inside LVLM.
-
Updated
Jun 4, 2025 - Jupyter Notebook
Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
-
Updated
May 7, 2025 - Python
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
-
Updated
Dec 27, 2024 - Python
[CVPR'25] Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
-
Updated
Oct 11, 2025 - Python
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
-
Updated
Feb 7, 2025
Code for USENIX Security 2024 paper: Moderating Illicit Online Image Promotion for Unsafe User Generated Content Games Using Large Vision-Language Models.
-
Updated
Apr 30, 2025 - Python
Improve this page
Add a description, image, and links to the lvlm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the lvlm topic, visit your repo's landing page and select "manage topics."