-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 206
Plan written to .claude/pipeline-artifacts/plan.md.
Summary of the approach:
-
Enhance
sw-otel.shwith a newcmd_trace_export()function that builds proper OTLP JSON with parent/child span hierarchy, nanosecond timestamps, and typed attributes (string/int/double) -
Add
pipeline exportsubcommand insw-pipeline.shthat delegates tosw-otel.sh trace-export -
Auto-export hook in pipeline completion path — fires when
OTEL_EXPORTER_OTLP_ENDPOINTis set -
5 files touched:
sw-otel.sh(core logic),sw-pipeline.sh(CLI + auto-export),event-schema.json(new event type),sw-otel-test.sh(10 test cases),docs/observability.md(Jaeger guide)
Key design decisions:
-
Reuse
sw-otel.shrather than creating a new script — it already has trace-building infrastructure -
Run-id matching supports both
job_idandissuenumber for flexibility - Deterministic span IDs from sha256 of run-id + stage name for reproducibility
- Non-blocking auto-export — failures logged but never fail the pipeline
- Pure bash + jq — no new dependencies lure
- Auto-export on pipeline completion if
OTEL_EXPORTER_ENDPOINTset - Integration example with Jaeger in
docs/observability.md
Approach A: Enhance existing sw-otel.sh cmd_trace()
- Modify the existing
cmd_trace()function to accept run-id, build proper span hierarchy with attributes - Add
pipeline exportsubcommand insw-pipeline.shthat delegates tosw-otel.sh - Pros: Reuses existing infrastructure, minimal new files, consistent with codebase patterns
-
Cons: Makes
sw-otel.shlarger
Approach B: New dedicated sw-pipeline-export.sh script
- Create a new standalone script for OTLP export
- Pros: Clean separation
-
Cons: Duplicates event-reading logic, more files to maintain, inconsistent with the existing
sw-otel.shwhich already has trace building
Chosen: Approach A — enhancing sw-otel.sh minimizes blast radius and builds on existing code. The pipeline export subcommand in sw-pipeline.sh will delegate to sw-otel.sh trace-export.
| Risk | Likelihood | Mitigation |
|---|---|---|
| OTLP JSON format doesn't match spec | Medium | Use exact field names from OTLP protobuf spec; validate against Jaeger |
| Large events.jsonl causes slow export | Low | Filter by run-id early with grep before jq parsing |
| Auto-export fails silently on completion | Medium | Log export errors but don't fail the pipeline; emit event |
| Bash 3.2 compatibility issues | Low | Avoid associative arrays; use indexed arrays and jq |
Depends on:
-
scripts/lib/helpers.sh—emit_event(), output helpers,now_iso() -
scripts/sw-otel.sh— existing trace/metrics infrastructure -
scripts/sw-pipeline.sh— pipeline completion hook, subcommand dispatch -
config/event-schema.json— event type definitions -
~/.shipwright/events.jsonl— event data source
Depended on by: Nothing (new feature, additive only)
Pipeline executions generate rich event data in events.jsonl but there's no way to visualize execution flow in standard observability tools. The existing cmd_trace() in sw-otel.sh builds a rudimentary OTLP structure but lacks run-id filtering, proper span IDs, attributes, and parent/child relationships.
Enhance sw-otel.sh with a new trace-export subcommand that builds proper OTLP JSON from events. Add pipeline export as a thin delegation. Hook auto-export into the pipeline completion path.
- Users can export any pipeline run as OTLP traces viewable in Jaeger/Honeycomb
- Auto-export on completion enables continuous observability without manual steps
- No new dependencies; pure bash + jq implementation
┌─────────────────────┐
│ sw-pipeline.sh │
│ (export subcommand)│
└────────┬────────────┘
│ delegates
▼
┌─────────────────────┐ ┌──────────────────┐
│ sw-otel.sh │────▶│ events.jsonl │
│ (trace-export) │ │ (event source) │
└────────┬────────────┘ └──────────────────┘
│ outputs
▼
┌─────────────────────┐ ┌──────────────────┐
│ OTLP JSON │────▶│ OTLP Collector │
│ (stdout or file) │ │ Jaeger/Honeycomb │
└─────────────────────┘ └──────────────────┘
Auto-export path:
┌─────────────────────┐
│ sw-pipeline.sh │
│ pipeline_cleanup() │──── if OTEL_EXPORTER_ENDPOINT set ──▶ sw-otel.sh export trace
└─────────────────────┘
// OTLP JSON output structure (OTLP/HTTP JSON encoding) interface OTLPTraceExport { resourceSpans: [{ resource: { attributes: Array<{key: string, value: {stringValue: string}}> }, scopeSpans: [{ scope: { name: string, version: string }, spans: Span[] }] }] } interface Span { traceId: string // 32 hex chars spanId: string // 16 hex chars parentSpanId: string // 16 hex chars (empty for root) name: string // "pipeline" or stage name kind: number // 1 = SPAN_KIND_INTERNAL startTimeUnixNano: string // nanosecond epoch as string endTimeUnixNano: string // nanosecond epoch as string status: { code: number // 0=UNSET, 1=OK, 2=ERROR message?: string } attributes: Array<{ key: string, value: {stringValue?: string, intValue?: string, doubleValue?: number} }> } // CLI interface // shipwright pipeline export --format otel <run-id> // shipwright otel trace-export <run-id> [--output <file>]
1. User runs: shipwright pipeline export --format otel <run-id>
2. sw-pipeline.sh delegates to: sw-otel.sh trace-export <run-id>
3. sw-otel.sh reads ~/.shipwright/events.jsonl
4. Filters events by job_id or issue number matching <run-id>
5. Builds root span from pipeline.started → pipeline.completed events
6. Builds child spans from stage.started → stage.completed/failed events
7. Attaches attributes (cost, template, outcome, etc.) from event fields
8. Outputs OTLP JSON to stdout (or file with --output)
9. If --send flag or auto-export: POST to OTEL_EXPORTER_ENDPOINT/v1/traces
- Event parsing errors: Skip malformed lines, continue processing (warn to stderr)
- Missing run-id: Error with "No events found for run-id: X", exit 1
-
OTLP export failure: Log error, emit
otel.export_failedevent, exit 1 (non-fatal in auto-export path) - Missing jq: Error with install instructions, exit 1
| File | Action | Purpose |
|---|---|---|
scripts/sw-otel.sh |
Modify | Add trace-export subcommand with proper OTLP span building |
scripts/sw-pipeline.sh |
Modify | Add export subcommand, add auto-export hook at completion |
config/event-schema.json |
Modify | Add otel.trace_exported event type |
scripts/sw-otel-test.sh |
Modify | Add tests for trace-export |
docs/observability.md |
Create | Jaeger integration guide |
Add a new cmd_trace_export() function that:
- Accepts
<run-id>argument (matches againstjob_idorissuefields) - Reads
events.jsonland filters events for the given run - Generates deterministic trace ID from run-id (md5/sha256 truncated to 32 hex)
- Generates deterministic span IDs from stage names (sha256 truncated to 16 hex)
- Builds root span from
pipeline.started/pipeline.completedevents - Builds child spans from
stage.started/stage.completed/stage.failed/stage.skippedevents - Converts ISO timestamps to nanosecond epoch (OTLP requirement)
- Attaches proper OTLP-format attributes using
{key, value: {stringValue/intValue}}encoding - Outputs valid OTLP JSON
Root span attributes:
-
pipeline.issue,pipeline.template,pipeline.result,pipeline.total_cost,pipeline.iterations,pipeline.agent_id
Stage span attributes:
-
stage.name,stage.outcome,stage.duration_s,stage.error_class
Add --output <file> flag to write to file instead of stdout.
Add --send flag to POST to OTEL_EXPORTER_OTLP_ENDPOINT/v1/traces.
Update the help text and main dispatch.
Add a pipeline_export() function and dispatch case:
- Parse
--format otelflag (only format for now, default to otel) - Accept positional
<run-id>argument - Delegate to
bash "$SCRIPT_DIR/sw-otel.sh" trace-export "$run_id" "$@" - Add to
show_help()output
In sw-pipeline.sh pipeline_cleanup() function (around line 2727, after successful completion event):
- Check if
OTEL_EXPORTER_ENDPOINTorOTEL_EXPORTER_OTLP_ENDPOINTis set - If set, run
bash "$SCRIPT_DIR/sw-otel.sh" trace-export "$SHIPWRIGHT_PIPELINE_ID" --send 2>/dev/null || true - Emit
otel.trace_exportedevent on success
Add to config/event-schema.json:
"otel.trace_exported": { "required": ["run_id"], "optional": ["endpoint", "spans_count"] }
Add to scripts/sw-otel-test.sh:
- Test trace-export with synthetic events (pipeline.started + stage events + pipeline.completed)
- Validate OTLP JSON structure (resourceSpans, scopeSpans, spans array)
- Validate span parent/child relationships (stage spans reference root span)
- Validate span attributes are present
- Validate timestamps are nanosecond epoch strings
- Test with issue number as run-id
- Test with job_id as run-id
- Test with no matching events (error case)
- Test --output flag writes to file
- Test --send flag behavior (mock curl)
Write integration guide with:
- Prerequisites (Jaeger Docker setup)
- Running a pipeline and exporting traces
- Auto-export configuration
- Viewing traces in Jaeger UI
- Honeycomb setup alternative
- Task 1: Implement
cmd_trace_export()insw-otel.shwith OTLP span building, run-id filtering, proper attributes - Task 2: Add
trace-exportsubcommand dispatch and help text insw-otel.sh - Task 3: Add
exportsubcommand tosw-pipeline.shdelegating tosw-otel.sh - Task 4: Add auto-export hook in
sw-pipeline.shpipeline_cleanup() on completion - Task 5: Add
otel.trace_exportedevent type toconfig/event-schema.json - Task 6: Write tests for trace-export in
sw-otel-test.sh - Task 7: Create
docs/observability.mdwith Jaeger integration example - Task 8: Run
npm test(specificallysw-otel-test.sh) and fix any failures
-
Unit tests (8): OTLP JSON structure validation, span relationships, attribute encoding, timestamp conversion, run-id filtering, error handling (in
sw-otel-test.sh) -
Integration tests (2): Full pipeline export flow via
sw-pipeline.sh export, auto-export mock (insw-otel-test.sh) - E2E tests (0): Skipped — requires actual OTLP collector; covered by Jaeger docs example
- All span types: root pipeline span, completed stage span, failed stage span, skipped stage span
- All attribute types: string, integer, double values
- Error paths: missing run-id, no events, malformed events
- Happy path: Export a complete pipeline run with 3+ stages as OTLP JSON, validate Jaeger-compatible structure
- Error case 1: Run-id with no matching events returns error
- Error case 2: Events with missing fields (graceful degradation)
- Edge case 1: Pipeline with only started event (no completion) — spans have empty endTime
- Edge case 2: Multiple pipeline runs for same issue — exports all as separate traces
CLI endpoint: shipwright pipeline export --format otel <run-id>
- Input: run-id (string — job_id or issue number)
- Output: OTLP JSON to stdout (exit 0) or error message to stderr (exit 1)
- Flags:
--output <file>,--send(POST to OTLP endpoint)
CLI endpoint: shipwright otel trace-export <run-id>
- Same as above (pipeline export delegates here)
Error codes:
- Exit 1: Missing run-id argument
- Exit 1: No events found for run-id
- Exit 1: jq not available
- Exit 1: --send failed (OTLP endpoint unreachable)
Rate Limiting: Not applicable (CLI tool, not a server) Versioning: Not applicable (internal CLI, no breaking API surface)
Not applicable — This is a CLI export tool, not a deployed service. However:
- The
otel.trace_exportedevent enables monitoring export frequency via existingshipwright otel metrics - Auto-export failures are logged to events.jsonl for post-mortem analysis
- Current
cmd_trace()scans entire events.jsonl (~1-10K lines typical) in <1s - No performance regression expected for typical event volumes
- Export should complete in <5s for events.jsonl files up to 100K lines
- Memory usage bounded by jq streaming (not loading full file into bash variables)
- Not needed for initial implementation — bash+jq is well-understood
- If events.jsonl grows very large, could add date-range pre-filtering with grep
- Test with synthetic 10K-line events.jsonl to validate <5s target
- No formal benchmarking needed for this feature scope
-
shipwright pipeline export --format otel <run-id>outputs valid OTLP JSON - Each pipeline stage is a child span of the root pipeline span
- Spans have correct timestamps (nanosecond epoch), status codes, and attributes
- Span attributes include: stage name, template, outcome, cost, iteration count
- Trace attributes include: issue number, repo, total cost, success/failure
- Auto-export triggers on pipeline completion when
OTEL_EXPORTER_ENDPOINTis set -
docs/observability.mddocuments Jaeger integration - All existing tests pass (
npm test) - New tests cover happy path, error cases, and edge cases