-
Notifications
You must be signed in to change notification settings - Fork 50
Releases: facebookresearch/ProgramBench
Releases · facebookresearch/ProgramBench
v1.0.2
@klieret
klieret
This minor release ignores ~30 tests that caused hangs when evaluating incorrect solutions.
Full Changelog: v1.0.1...v1.0.2
Assets 2
1 person reacted
v1.0.1
@klieret
klieret
What's Changed
- Fix: stderr messages can corrupt XML coverage report (#5), thanks for the report @darshanmakwana412
New Contributors
- @eltociear made their first contribution in #8
Full Changelog: v1.0.0...v1.0.1
Assets 2
1 person reacted
ProgramBench 🦊
@klieret
klieret
How much of SQLite, FFmpeg, PHP compiler can Opus 4.7 rebuild from scratch? Given just an executable and no starter code or internet access.
Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end.
Read more: https://programbench.com/
imageAssets 2
3 people reacted