Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Can the agent distinguish stdout and stderr? #37

Open
Labels
bugSomething isn't working

Description

Hi!

I find that some test functions failed against a generated executable because the test expected to have some string in stderr, while the executable gave that string in stdout.

I also noticed that ProgramBench’s container execution helper appears to merge stdout/stderr streams here.

Is stdout/stderr separation intentionally unavailable to agents during exploration? (I think it is still possible for an agent to manually test stream placement with shell redirection, e.g. cmd >/tmp/out 2>/tmp/err, but the default observation format seems to make the distinction easy to miss.)

Thank you!

The commands I ran

uv run --with mini-swe-agent mini-extra programbench \
 --filter "abishekvashok__cmatrix.5c082c6" \
 --output output \
 --model openai/gpt-5.4
uv run programbench eval output

Score log

 Evaluation Summary 
 Instance Score Comment 
 abishekvashok__cmatrix.5c082c6 94 507 tests 
 Average 94 1 instances 

A failed test in eval.json

  • task: abishekvashok/cmatrix
  • test function name: eval.tests.test_cmatrix.TestColorOptions.test_color_invalid_shows_error_message
  • branch: 1b991a57d4e9
self = <test_cmatrix.TestColorOptions object at 0x7732071f2bf0>
 def test_color_invalid_shows_error_message(self):
 """Test invalid color name produces error message."""
 result = run("-C", "purple")
 assert result.returncode == 0 # Exits with 0 but shows error
> assert b"Invalid color" in result.stderr
E AssertionError: assert b'Invalid color' in b''
E + where b'' = CompletedProcess(args=['./executable', '-C', 'purple'], returncode=0, stdout=b' Invalid color selection\n Valid colors are green, red, blue, white, yellow, cyan, magenta and black.\n', stderr=b'').stderr
eval/tests/test_cmatrix.py:120: AssertionError

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /