Pull requests: EleutherAI/lm-evaluation-harness

New pull request New

161 Open 1,654 Closed

Pull requests list

Add support for configurable chrF metric parameters in task YAML, fix...

#3363 opened Oct 23, 2025 by augustlakia

fix trust_remote_code=True for longbench

#3361 opened Oct 22, 2025 by jannalulu

Longbench group fix

#3359 opened Oct 22, 2025 by jannalulu

Fix issue 3355 assertion error

#3356 opened Oct 20, 2025 by marksverdhei

Add gsm_symbolic and gsm_symbolic_cot tasks

#3354 opened Oct 19, 2025 by MengAiDev

[AIME24 | AIME25] Enable Multiple Generation Repeats with Pass@k and Majority@k Metrics

#3351 opened Oct 17, 2025 by ihebchaa

fix(tasks):pin correct MMLUSR version

#3350 opened Oct 16, 2025 by christinaexyou

added azure openai support

#3349 opened Oct 16, 2025 by zinccat

Delegate BOS to the tokenizer; add_bos_token defaults to None

#3347 opened Oct 15, 2025 by baberabb

Added ULQA benchmark

#3340 opened Oct 13, 2025 by keramjan

Add support for LLMSQL

#3334 opened Oct 9, 2025 by DzmitryPihulski

Fix PIL image hashing to use actual bytes instead of object repr

#3331 opened Oct 7, 2025 by tboerstad

feat: Add support for accelerate-wrapped models in simple_evaluate()

#3313 opened Sep 26, 2025 by DhruvaKashyap

Add MATH500

#3311 opened Sep 26, 2025 by jannalulu

Support empty response for Completions and ChatCompletions API

#3309 opened Sep 22, 2025 by tboerstad

Adding New Task SLR-Bench : Scalable Logical Reasoning Benchmark

#3305 opened Sep 20, 2025 by Ahmad21Omar

Support torchrun vllm DP

#3304 opened Sep 19, 2025 by luccafong

Gemini evaluation support

#3300 opened Sep 15, 2025 by IsraelAbebe

Fix lambada_multilingual_stablelm

#3294 opened Sep 11, 2025 by jmichaelov

Adding SPaRC to lm eval harness

#3262 opened Aug 25, 2025 by lkaesberg

Add long-context evaluation benchmarks (LongBench v2, Babilong, InfiniteBench, Phonebook)

#3256 opened Aug 21, 2025 by Mariani-code

fix gsm8k normalization

#3254 opened Aug 20, 2025 by huaanrui

Main

#3250 opened Aug 20, 2025 by seongtaehong

Adding 3LM to lm eval harness

#3241 opened Aug 14, 2025 by GeorgeSherif

Trim thinking content from model output in IFEval

#3240 opened Aug 14, 2025 by davideguidobene

ProTip! Filter pull requests by the default branch with base:main.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pull requests: EleutherAI/lm-evaluation-harness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pull requests list