Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/ C3 Public

[EMNLP 2025 main] C3 Benchmark: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

Notifications You must be signed in to change notification settings

step-out/C3

Repository files navigation

πŸ‘‰πŸ» C3 Benchmark πŸ‘ˆπŸ»

🌐 Website β€’ πŸ€— Hugging Face β€’ πŸ“ƒ arXiv Paper

πŸ“° News

  • August 21, 2025: C3 Benchmark has been accepted to EMNLP 2025 main conference.

✨ Key Features

  • 🌍 Bilingual Coverage: Comprehensive evaluation in both English and Chinese.

  • 🎯 Real-world Complexity: Based on empirical analysis of actual spoken dialogues, covering 1,079 instances with 1,586 audio-text paired samples.

  • πŸ’ͺ LLM-based Automatic Evaluation: Reliable evaluation with >0.87 correlation to human judgments using GPT-4o and DeepSeek-R1. See prompts.md for evaluation prompts.

  • 🎡 End-to-End Focus: Specifically designed for end-to-end spoken dialogue models, considering crucial phonological features.

  • πŸ“Š Challenging Benchmark(as of 29 July 2025): Comprehensive evaluation of 10 leading SDMs reveals the benchmark’s difficulty. Top scores reach only 40.08 % (Chinese) and 55.68 % (English).

πŸš€ Usage

  1. Prepare Data:

  2. Run Evaluation:

  3. Calculate Accuracy:

    • Calculate: Use process_results.py to generate final accuracy metrics automatically, see CalculationUsage.md for detailed workflow
  4. Submit Results:

πŸ“– Citation

If you find C3Bench useful for your research and applications, feel free to give us a star ⭐ or cite us using:

@inproceedings{ma2025c3,
 title={C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations},
 author={Ma, Chengqian and Tao, Wei and Guo, Yiwen},
 booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
 year={2025},
 publisher={Association for Computational Linguistics},
 doi={10.18653/v1/2025.emnlp-main.1160},
 pages={22789--22807},
 ISBN={979-8-89176-332-6},
}

About

[EMNLP 2025 main] C3 Benchmark: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /