-
Notifications
You must be signed in to change notification settings - Fork 2
Comments
feat(cli): improve logging system with trace file support and fixes#84
feat(cli): improve logging system with trace file support and fixes #84
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c00cb07fd
i️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1 Badge Trace flag no-ops on markform fill
The new global --trace/MARKFORM_TRACE option is parsed into CommandContext.traceFile here, but the primary fill command never feeds that value into any logging callback—it still uses createCliToolCallbacks, which only write to the console. That means the example command in the commit message (markform fill ... --trace /tmp/fill-trace.log) never produces a trace file; only run/research benefit from tracing while fill silently ignores the flag.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this still --wire-log ? We said it would be --trace everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly we should name the variables consistently so this would be tracePathOption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this at all? If we're writing to a trace, we should be writing clean data to the trace not colorized or ANSI codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More names calling it WireLog. Let's call it Trace Everywhere. Search the codebase for WireLog and replace it with Trace. Do a case-insensitive search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again wrong name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already implemented this elsewhere. If you do need these pull them out into a common shared library and only implement them once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. This is a common shared library. You should only ever implement it once and it should be in a general common utilities location so you never implement it again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this is a common utility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again same here
package.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why did we need this? It's fine if we do but I didn't see where it was necessary
Coverage Report for packages/markform
File Coverage
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Address PR review comments: - Fix fill command --trace flag which was silently ignored (2660027464) - Add trace file output to createCliToolCallbacks for tool/LLM/reasoning logs - Move safeStringify() to shared formatUtils.ts library (2660068669) - Re-export safeStringify from traceUtils.ts for convenience All 11 PR #84 review comments are now fully addressed.
Comprehensive feature plan covering: - Three logging levels (default, verbose, debug) - Wire format capture via --wire-log flag - Unified callback system across fill/research/run commands
...ements - Add library-first callback design with structured tool info - Define enhanced console progress display with search queries/results - Add emoji indicators and timing information - Update implementation phases and open questions
...e summaries - Default mode shows tool calls with queries, timing, result counts, and sources - Verbose adds top result titles, token counts, tool summary - Updated FillCallbacks with sources and topResults fields - Updated examples to reflect new logging levels
Design decisions finalized: - Wire log format: unify with golden test transcript format - Debug truncation: 500 chars, configurable in settings.ts - LOG_LEVEL env var: equivalent to --debug flag - Web search parsing: show first 5-8 result titles/domains - Emoji usage: limited set per CLI best practices (✓ ❌ → [tool]) - Callback versioning: clean break, no backward compat - Non-TTY progress: log lines instead of spinner
...onal details Default mode now includes: - Model and provider info at start - Token counts per turn - First 5-8 result titles from web search - Tool summary at end of turn - Patch validation warnings/errors Verbose mode now focuses on operational details: - Harness configuration - Full web search result listings - Patch accept/reject with reasons - Validator execution details - Form progress stats
Extend the plan spec with support for capturing AI SDK reasoning fields: - Add WireResponseStep.reasoning field for step-level reasoning - Add reasoningTokens to usage tracking - Add onReasoningGenerated callback for library users - Display reasoning in verbose/debug modes - New Phase 5 for reasoning capture implementation Also adds tsx as root dev dependency for running TypeScript scripts. Based on AI SDK documentation research: - reasoningText: final step reasoning - steps[].reasoning: per-step reasoning array - usage.reasoningTokens: token count for reasoning - providerMetadata: provider-specific data
Updates to the CLI logging plan: 1. Clarify TTY detection uses existing infrastructure: - createSpinnerIfTty() already handles non-TTY - shouldUseColors() respects NO_COLOR and TTY - picocolors auto-detects - no new implementation needed 2. Add wire format content selection decision (#9): - Capture: toolCalls, toolResults, text, reasoning, usage, response.id/modelId - Skip: providerMetadata, response.messages (redundant), finishReason per step - Keeps wire logs focused and diffable Based on AI SDK documentation research. Live API testing was blocked by quota limits on both OpenAI and Anthropic accounts.
Complete Stage 5 validation section with: Automated Test Coverage: - Unit tests for logging utils, web search parsing, fill logging - Callback interface tests for structured tool info - Wire format tests for AI SDK response capture - Integration tests for CLI output at each log level - Cross-command consistency tests (fill, research, run) - Golden tests for wire format and log level outputs Manual Validation Checklist: - Visual console output review at each log level - TTY vs non-TTY behavior (colors, spinner, NO_COLOR) - Wire log YAML review for completeness and diffability - Environment variable behavior (LOG_LEVEL, MARKFORM_WIRE_LOG) - Error handling scenarios - Library API validation with TypeScript callbacks - Cross-command visual comparison - Documentation accuracy verification Acceptance and Regression Checks: - All 7 acceptance criteria from Stage 1 - 5 regression checks for existing behavior
Expands validation plan Stage 5 with senior-engineer-level test coverage: - Edge cases: empty forms, large forms, Unicode, truncation boundaries - Error paths: network failures, auth errors, file system errors, interrupts - Security: API key redaction, PII handling, file permissions - Performance: memory usage, I/O benchmarks, scalability targets - Compatibility matrix: Node versions, OS, terminals, CI environments - Graceful degradation: partial failures, missing features, backward compat
This PR implements the CLI logging improvements outlined in the plan spec: - Added `LogLevel` type: quiet, default, verbose, debug - Added `--debug` CLI flag for full diagnostic output - Added `logDebug()` function for debug-level messages - Added `MARKFORM_LOG_LEVEL` environment variable support - Added `--wire-log <path>` flag to fill and research commands - Added `MARKFORM_WIRE_LOG` environment variable support - Captures full LLM request/response in YAML format - Extended `FillCallbacks` with `toolType`, `query`, `resultCount`, `sources`, `topResults` - Added `toolParsing.ts` with web search result extraction - Shows search queries in yellow, results summary in default output - Full result listings available in verbose mode - Updated `fillLogging.ts` to respect all log levels - Updated research command to use `createFillLoggingCallbacks` - Consistent output format across fill and research commands - New: `src/harness/toolParsing.ts` - New: `docs/project/specs/active/valid-2026年01月04日-agent-cli-logging-improvements.md` - Modified: CLI, harness, and test files - All 1432 unit tests pass - TypeScript strict mode passes - ESLint with --max-warnings 0 passes - Build succeeds
- Add WireReasoningContent type and reasoning field to WireResponseStep - Add reasoningTokens to WireResponseFormat usage - Extract reasoning from AI SDK responses in liveAgent - Call onReasoningGenerated callback when reasoning present - Include reasoning in wire format YAML output - Trim plan spec validation section from 1158 to 764 lines - Move detailed test checklists to validation spec reference - Close markform-545 and markform-546 beads
- Add --wire-log flag to run command for consistency with fill/research - Add transcript field to FillResult type for wire format capture - Update programmaticFill to build transcript when captureWireFormat is enabled - Add Log Levels and Wire Format Capture sections to development.md - Update validation spec with run command testing and complete file list
- Add --trace flag for incremental file logging during execution - Add MARKFORM_TRACE env var support for trace file - Increase DEBUG_OUTPUT_TRUNCATION_LIMIT from 500 to 2000 chars - Make truncation limit configurable via MARKFORM_DEBUG_TRUNCATION_LIMIT - Fix run.ts to pass model info to createFillLoggingCallbacks - Add traceFile support to research.ts callbacks - Document logging system review findings Trace file support allows monitoring long-running fills by writing log output incrementally to a file (without ANSI colors), useful for debugging and post-hoc analysis.
- Add trace file support to fill.ts command (addresses Codex review) - Update tryscript tests to include new --debug and --trace options - Update README test to use more flexible badge matching The --trace flag was previously only working in run/research commands. This fix adds the same incremental file logging support to the fill command.
- Add comprehensive manual validation steps for trace file feature - Update test counts (1455 unit tests, 18 tryscript tests) - Add edge cases and error handling verification steps - Document combined flags testing scenarios - Add potential issues to watch for section
...traceUtils Per PR review feedback: - Remove --wire-log flag from fill, research, and run commands - Use only global --trace flag for all trace file output - Create shared traceUtils.ts library with common utilities: - stripAnsi: Remove ANSI codes from strings for file output - createTracer: Create trace function for file logging - truncate: Truncate strings with ellipsis for debug output - formatDuration: Format milliseconds as human-readable - formatBytes: Format file sizes as human-readable - Update fillLogging.ts to use shared utilities - Remove duplicate code and unused imports This reduces code duplication and provides a cleaner API where trace output is controlled via the global --trace flag.
Add trace file output to the run command for consistency with fill and research commands. Now all form-filling commands log workflow configuration, completion status, and timing to the trace file when --trace is specified.
Create reusable formatUtils.ts with general-purpose formatting utilities: - stripAnsi: Remove ANSI escape codes from strings - safeTruncate: Truncate strings with ellipsis (renamed from truncate) - formatDuration: Format milliseconds as human-readable - humanReadableSize: Format bytes as human-readable (renamed from formatBytes) traceUtils.ts now imports from formatUtils.ts and re-exports for backward compatibility. This allows these utilities to be reused across the codebase, not just in CLI trace code.
Scripts use 'npx tsx' which works without a local dependency. This was added unnecessarily in a previous commit.
- Support both 'text' and 'content' property names for reasoning items since different AI providers may use different property names - Add fallback message when reasoning content is not available - Add tests for reasoning callback in fillLogging
Add end-to-end tests for CLI logging at different verbosity levels: - Default mode: shows turn and patch info - Verbose mode: shows config details (max turns, patches, roles) - Quiet mode: suppresses turn output - Trace file: verifies file creation, header format, content - Output verification: file creation and content checks - User role fill with --roles flag These tests run actual fill commands with mock agents and verify the logging output matches expected patterns.
- Add test results table showing pass/fail status for all logging features - Mark --quiet flag bug (markform-8): session transcript still printed - Document completed manual testing for default, verbose, debug, trace modes - Note live agent testing blocked by network issues - Add bead markform-8 for quiet mode bug tracking
Address PR review comments: - Fix fill command --trace flag which was silently ignored (2660027464) - Add trace file output to createCliToolCallbacks for tool/LLM/reasoning logs - Move safeStringify() to shared formatUtils.ts library (2660068669) - Re-export safeStringify from traceUtils.ts for convenience All 11 PR #84 review comments are now fully addressed.
Update validation plan with detailed test results from systematic testing: - Mock mode at all log levels (default, quiet, verbose, debug) - Trace file output with ANSI stripping verification - Session recording (--record flag) - Document known bug markform-8 (quiet mode doesn't suppress transcript) - Add reviewer testing checklist for live agent tests - Remove outdated --wire-log references (flag removed) - Document all 11 PR review comments addressed
- Live agent testing now passing with GPT-4.1-mini - Token counts, LLM call logging, tool tracking all verified - Trace file captures all LLM/tool activity - Required undici ProxyAgent for containerized environment - All core logging features confirmed working
6af6350 to
569bf03
Compare
OpenAI's web_search tool executes server-side (no local execute function), so the callback wrapping was skipping it. This fix: 1. Modified wrapToolsWithCallbacks to return both wrapped tools and set of wrapped tool names for tracking 2. Added code in step processing loop to fire onToolStart/onToolEnd callbacks for non-wrapped tools using step results 3. Fixed AI SDK type property names: input not args, output not result This ensures [web_search] "query" and tool results are logged to console during live agent execution.
Uh oh!
There was an error while loading. Please reload this page.
Summary
This PR builds on the logging improvements from PR #73, adding additional enhancements and fixes based on a senior engineering review:
--trace <file>flag for incremental file logging during execution (all fill-related commands)MARKFORM_TRACEenv var support for trace fileDEBUG_OUTPUT_TRUNCATION_LIMITfrom 500 to 2000 chars (configurable viaMARKFORM_DEBUG_TRUNCATION_LIMIT)createFillLoggingCallbacksfor consistent outputdocs/project/specs/active/review-2026年01月04日-cli-logging-system.mdTrace File Feature
The trace file feature allows monitoring long-running fills by writing log output incrementally to a file (without ANSI colors). This is useful for:
Usage:
Or via environment variable:
Files Changed
src/cli/cli.ts- Added --debug and --trace global flagssrc/cli/lib/cliTypes.ts- Added LogLevel type, debug property, traceFile to CommandContextsrc/cli/lib/shared.ts- Added logDebug function, computeLogLevel helper, traceFile extractionsrc/cli/lib/fillLogging.ts- Enhanced with LogLevel support, structured tool info, trace file supportsrc/cli/commands/fill.ts- Added trace file support with createTracer helpersrc/cli/commands/research.ts- Added traceFile support to callbackssrc/cli/commands/run.ts- Added traceFile support to callbackssrc/settings.ts- Increased DEBUG_OUTPUT_TRUNCATION_LIMIT to 2000tests/cli/commands.tryscript.md- Updated to include --debug and --trace in help outputRelated Documents
Manual Validation
1. Verify --trace Flag for Fill Command
Run with
--traceflag to capture incremental output to file:Verify:
/tmp/fill-trace.logis created# Markform Fill Trace LogTurn 1: ...Form completed in N turn(s)2. Verify --trace Flag for Run Command
Verify:
3. Verify --trace Flag for Research Command
Verify:
4. Verify MARKFORM_TRACE Environment Variable
Verify:
--traceflag takes precedence over env var5. Verify --debug Flag
Run with
--debugflag to see enhanced output:Verify:
[tool_name]line6. Verify --wire-log Flag
Run with
--wire-logto capture wire format:Verify:
/tmp/wire.yamlis createdsessionVersion,mode,modelId,formPathturnsarray withturnnumber andwiredatarequestwith system/prompt andresponsewith steps7. Verify Combined Flags
Test multiple flags together:
Verify:
Edge Cases and Error Handling
/nonexistent/dir/trace.log) shows warning but doesn't crashAutomated Testing