Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: facebookresearch/ProgramBench

v1.0.2

11 May 16:58
@klieret klieret
b33e660
This commit was signed with the committer’s verified signature.
klieret Kilian Lieret
GPG key ID: C0C28B29F138F6BA
Verified
Learn about vigilant mode.

Choose a tag to compare

This minor release ignores ~30 tests that caused hangs when evaluating incorrect solutions.

Full Changelog: v1.0.1...v1.0.2

Assets 2
Loading
reneleonhardt reacted with hooray emoji
1 person reacted

v1.0.1

07 May 12:45
@klieret klieret
1fe64c8
This commit was signed with the committer’s verified signature.
klieret Kilian Lieret
GPG key ID: C0C28B29F138F6BA
Verified
Learn about vigilant mode.

Choose a tag to compare

What's Changed

  • Fix: stderr messages can corrupt XML coverage report (#5), thanks for the report @darshanmakwana412

New Contributors

Full Changelog: v1.0.0...v1.0.1

Contributors

eltociear and darshanmakwana412
Loading
reneleonhardt reacted with hooray emoji
1 person reacted

ProgramBench 🦊

05 May 14:31
@klieret klieret
2803dcc
This commit was signed with the committer’s verified signature.
klieret Kilian Lieret
GPG key ID: C0C28B29F138F6BA
Verified
Learn about vigilant mode.

Choose a tag to compare

How much of SQLite, FFmpeg, PHP compiler can Opus 4.7 rebuild from scratch? Given just an executable and no starter code or internet access.

Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end.

Read more: https://programbench.com/

image
Loading
reneleonhardt reacted with heart emoji 18jeffreyma and Piping reacted with rocket emoji
3 people reacted

AltStyle によって変換されたページ (->オリジナル) /