Name	Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets	assets
demo	demo
utils	utils
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
LICENSE	LICENSE
README.md	README.md
README_CN.md	README_CN.md
demo_element.py	demo_element.py
demo_layout.py	demo_layout.py
demo_page.py	demo_page.py
pyproject.toml	pyproject.toml
requirements.txt	requirements.txt

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

Dolphin-v2 is an enhanced universal document parsing model that substantially improves upon the original Dolphin. It seamlessly handles any document type—whether digital-born or photographed—through a document-type-aware two-stage architecture with scalable anchor prompting.

📑 Overview

Document image parsing is challenging due to diverse document types and complexly intertwined elements such as text paragraphs, figures, formulas, tables, and code blocks. Dolphin-v2 addresses these challenges through a document-type-aware two-stage approach:

🔍 Stage 1: Document type classification (digital vs. photographed) + layout analysis with reading order prediction
🧩 Stage 2: Hybrid parsing strategy - holistic parsing for photographed documents, parallel element-wise parsing for digital documents

Dolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.

📅 Changelog

🔥 2025年12月12日 Released Dolphin-v2 model. Upgraded to 3B parameters with 21-element detection, attribute field extraction, dedicated formula/code parsing, and robust photographed document parsing. (Dolphin-1.5 moved to v1.5 branch)
🔥 2025年10月16日 Released Dolphin-1.5 model. While maintaining the lightweight 0.3B architecture, this version achieves significant parsing improvements. (Dolphin 1.0 moved to v1.0 branch)
🔥 2025年07月10日 Released the Fox-Page Benchmark, a manually refined subset of the original Fox dataset. Download via: Baidu Yun | Google Drive.
🔥 2025年06月30日 Added TensorRT-LLM support for accelerated inference!
🔥 2025年06月27日 Added vLLM support for accelerated inference!
🔥 2025年06月13日 Added multi-page PDF document parsing capability.
🔥 2025年05月21日 Our demo is released at link. Check it out!
🔥 2025年05月20日 The pretrained model and inference code of Dolphin are released.
🔥 2025年05月16日 Our paper has been accepted by ACL 2025. Paper link: arXiv.

📈 Performance

Comprehensive evaluation of document parsing on OmniDocBench (v1.5)

Model	Size	Overall↑	Text^Edit↓	Formula^CDM↑	Table^TEDS↑	Table^TEDS-S↑	Read Order^Edit↓
Dolphin	0.3B	74.67	0.125	67.85	68.70	77.77	0.124
Dolphin-1.5	0.3B	85.06	0.085	79.44	84.25	88.06	0.071
Dolphin-v2	3B	89.78	0.054	87.63	87.02	90.48	0.054

🛠️ Installation

Clone the repository:

git clone https://github.com/ByteDance/Dolphin.git
cd Dolphin

Install the dependencies:
```
pip install -r requirements.txt
```

Download the pre-trained models of Dolphin-v2:

Visit our Huggingface model card, or download model by:

# Download the model from Hugging Face Hub
git lfs install
git clone https://huggingface.co/ByteDance/Dolphin-v2 ./hf_model
# Or use the Hugging Face CLI
pip install huggingface_hub
huggingface-cli download ByteDance/Dolphin-v2 --local-dir ./hf_model

⚡ Inference

Dolphin provides two inference frameworks with support for two parsing granularities:

Page-level Parsing: Parse the entire document page into a structured JSON and Markdown format
Element-level Parsing: Parse individual document elements (text, table, formula)

📄 Page-level Parsing

# Process a single document image
python demo_page.py --model_path ./hf_model --save_dir ./results \
 --input_path ./demo/page_imgs/page_1.png 
# Process a single document pdf
python demo_page.py --model_path ./hf_model --save_dir ./results \
 --input_path ./demo/page_imgs/page_6.pdf 
# Process all documents in a directory
python demo_page.py --model_path ./hf_model --save_dir ./results \
 --input_path ./demo/page_imgs 
# Process with custom batch size for parallel element decoding
python demo_page.py --model_path ./hf_model --save_dir ./results \
 --input_path ./demo/page_imgs \
 --max_batch_size 8

🧩 Element-level Parsing

# Process element images (specify element_type: table, formula, text, or code)
python demo_element.py --model_path ./hf_model --save_dir ./results \
 --input_path \
 --element_type [table|formula|text|code]

🎨 Layout Parsing

# Process a single document image
python demo_layout.py --model_path ./hf_model --save_dir ./results \
 --input_path ./demo/page_imgs/page_1.png \
 
# Process a single PDF document
python demo_layout.py --model_path ./hf_model --save_dir ./results \
 --input_path ./demo/page_imgs/page_6.pdf \
# Process all documents in a directory
python demo_layout.py --model_path ./hf_model --save_dir ./results \
 --input_path ./demo/page_imgs

🌟 Key Features

🔄 Two-stage analyze-then-parse approach based on a single VLM
📊 Promising performance on document parsing tasks
🔍 Natural reading order element sequence generation
🧩 Heterogeneous anchor prompting for different document elements
⏱️ Efficient parallel parsing mechanism
🤗 Support for Hugging Face Transformers for easier integration

📮 Notice

Call for Bad Cases: If you have encountered any cases where the model performs poorly, we would greatly appreciate it if you could share them in the issue. We are continuously working to optimize and improve the model.

💖 Acknowledgement

We would like to acknowledge the following open-source projects that provided inspiration and reference for this work:

📝 Citation

If you find this code useful for your research, please use the following BibTeX entry.

@article{feng2025dolphin,
 title={Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting},
 author={Feng, Hao and Wei, Shu and Fei, Xiang and Shi, Wei and Han, Yingdong and Liao, Lei and Lu, Jinghui and Wu, Binghong and Liu, Qi and Lin, Chunhui and others},
 journal={arXiv preprint arXiv:2505.14059},
 year={2025}
}

Star History

Star History Chart

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

youseefhamdi/Dolphin

Folders and files

Latest commit

History

Repository files navigation

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

📑 Overview

📅 Changelog

📈 Performance

🛠️ Installation

⚡ Inference

📄 Page-level Parsing

🧩 Element-level Parsing

🎨 Layout Parsing

🌟 Key Features

📮 Notice

💖 Acknowledgement

📝 Citation

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

License

youseefhamdi/Dolphin

Folders and files

Latest commit

History

Repository files navigation

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

📑 Overview

📅 Changelog

📈 Performance

🛠️ Installation

⚡ Inference

📄 Page-level Parsing

🧩 Element-level Parsing

🎨 Layout Parsing

🌟 Key Features

📮 Notice

💖 Acknowledgement

📝 Citation

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages