Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 22

Seth Ford edited this page Feb 13, 2026 · 2 revisions

Design: Add multi-repo fleet auto-discovery from GitHub org

Context

Shipwright's fleet orchestrator (scripts/sw-fleet.sh) currently requires manual configuration of repos in .claude/fleet-config.json. Users managing GitHub organizations with many repos must hand-edit this config for each repo, which is error-prone and doesn't adapt as repos are created, archived, or change activity levels.

Constraints from the codebase:

  • All scripts must be Bash 3.2 compatible (no associative arrays, no readarray, no ${var,,})
  • Scripts use set -euo pipefail, atomic file writes (tmp + mv), and jq --arg for JSON
  • GitHub API calls must respect $NO_GITHUB env var (existing pattern across all modules)
  • The fleet already has a background loop pattern: fleet_rebalance() runs on an interval via sleep in a backgrounded subshell — the rediscovery loop should follow the same pattern
  • gh api is the standard GitHub client (not raw curl), used throughout sw-github-graphql.sh and sw-fleet.sh
  • Fleet config lives at .claude/fleet-config.json with a known schema (repos[], worker_pool, etc.)

Decision

Extend sw-fleet.sh with a discover subcommand and fleet_rediscover_loop() background process.

Data Flow

gh api /orgs/{org}/repos --paginate
 → JSON array of repos
 → Filter: language, pushed_at > activity_days, topics, has_open_issues, !archived, !disabled, !fork (unless --include-forks)
 → Opt-out check: skip repos with "shipwright-ignore" topic
 → Opt-out check: skip repos where gh api /repos/{owner}/{repo}/contents/.shipwright-ignore returns 200
 → Generate fleet-config.json entries (or merge with existing)
 → Output summary / dry-run report

CLI Interface

shipwright fleet discover --org <org> [flags]

Flags:

  • --org <name> — GitHub org (required)
  • --language <lang> — filter by primary language
  • --activity-days <N> — only repos pushed within N days (default: 90)
  • --topic <topic> — require this topic (repeatable via comma-separated)
  • --has-issues — only repos with open issues
  • --include-forks — include forked repos (excluded by default)
  • --merge — merge discovered repos into existing config rather than overwriting
  • --dry-run — print what would be added, don't write config

Config Schema Addition

{
 "repos": [...],
 "worker_pool": {...},
 "auto_discover": {
 "enabled": false,
 "org": "my-org",
 "interval_seconds": 3600,
 "filters": {
 "language": null,
 "activity_days": 90,
 "has_issues": false,
 "topics": [],
 "include_forks": false
 }
 }
}

Background Re-discovery

fleet_rediscover_loop() follows the identical pattern to fleet_rebalance():

  1. Spawned as a backgrounded subshell from fleet_start()
  2. Sleeps for interval_seconds, then calls fleet_discover --org "$org" --merge
  3. On new repos found, writes a fleet-rediscover.flag file
  4. Main fleet loop checks for this flag file and calls fleet_add_repo() to hot-add repos to running daemons
  5. Flag file removed after processing

Hot-Add Mechanism

fleet_add_repo() adds a repo entry to the in-memory config, starts a daemon for the new repo (following existing fleet_start_repo() patterns), and updates the fleet status file. This avoids restarting the entire fleet for newly discovered repos.

Topology in fleet_status()

Extend the existing status output with a topology section:

  • Repos grouped by machine (local vs. each remote)
  • Workers allocated per repo
  • Active/queued job counts
  • Auto-discover: enabled/disabled, last scan timestamp, next scan ETA

Error Handling

  • $NO_GITHUB set → fleet_discover() prints warning and exits 0 (no-op, consistent with other modules)
  • gh api failures → logged via warn(), discovery aborted for that run, next interval retries
  • Invalid org / 404 → error() + exit 1 for CLI, warn() + continue for background loop
  • Rate limiting → gh api handles retry headers natively; if pagination fails mid-stream, partial results are discarded (no partial writes)
  • .shipwright-ignore file check failure (network error) → repo is included (fail-open, user can always add shipwright-ignore topic as the reliable opt-out)
  • Atomic config writes: write to fleet-config.json.tmp, then mv into place

Pagination

gh api --paginate handles GitHub's Link-header pagination automatically. For orgs with 1000+ repos, this produces a single concatenated JSON array. We pipe through jq filters in a single pass.

Alternatives Considered

  1. GitHub GraphQL API via sw-github-graphql.sh — Pros: single request for all data including topics, richer filtering server-side, lower API call count. Cons: sw-github-graphql.sh is designed for per-repo queries within a known repo context, not org-wide scans; would require new query templates and caching logic; REST gh api --paginate is simpler and already used in fleet for health checks; GraphQL org queries require different auth scopes. Rejected: unnecessary complexity for the use case.

  2. Separate discovery script (sw-fleet-discover.sh) — Pros: smaller files, clear separation. Cons: discovery is tightly coupled to fleet config schema and hot-add; a separate file would need to import fleet internals or duplicate them; the existing fleet script already handles config loading, status, and rebalancing — discovery is a natural extension. Rejected: would create coupling issues without meaningful separation benefit.

  3. GitHub App / webhook-based discovery — Pros: real-time repo creation events, no polling. Cons: requires a running server or Lambda, GitHub App setup, dramatically increases infrastructure complexity; polling at 1-hour intervals is sufficient for fleet management where daemon startup itself takes seconds. Rejected: over-engineered for the use case.

Implementation Plan

  • Files to create: None
  • Files to modify:
    • scripts/sw-fleet.sh — Add fleet_discover(), fleet_rediscover_loop(), fleet_add_repo(), topology in fleet_status(), CLI parsing for discover subcommand, load_fleet_config() updates for auto_discover block
    • scripts/sw-fleet-test.sh — 13 new test cases covering discover, filters, opt-out, merge, dry-run, rediscovery loop, hot-add, topology display, NO_GITHUB handling
    • .claude/CLAUDE.md — Document fleet discover command, auto_discover config keys, topology status output
  • Dependencies: None new. Uses existing gh, jq, standard POSIX tools.
  • Risk areas:
    • Pagination memory for large orgs: gh api --paginate concatenates all pages into memory. For orgs with 5000+ repos, this could be several MB of JSON. Acceptable for bash fleet management; not a realistic bottleneck.
    • .shipwright-ignore file checks: One API call per discovered repo to check for the file. For 100 repos, that's 100 sequential API calls. Mitigate by checking the shipwright-ignore topic first (free, already in the repo listing response) and only checking the file for repos that pass all other filters. Consider caching results in the rediscovery loop.
    • Race condition on hot-add: If rediscovery and rebalancer both modify config simultaneously. Mitigate with atomic writes and flag-file signaling (rebalancer processes flag after its current cycle).
    • --merge correctness: Must match repos by path field (local repos) or remote URL. Repos already in config should not be duplicated. Use jq to deduplicate by a canonical key.

Validation Criteria

  • shipwright fleet discover --org test-org --dry-run lists repos without modifying config
  • shipwright fleet discover --org test-org generates valid fleet-config.json with discovered repos
  • --language, --activity-days, --topic, --has-issues, --include-forks filters reduce the repo list correctly
  • Repos with shipwright-ignore topic are excluded from discovery
  • Repos with .shipwright-ignore file are excluded from discovery
  • --merge adds new repos to existing config without duplicating or removing existing entries
  • auto_discover config block is parsed by load_fleet_config() and drives fleet_rediscover_loop()
  • Background rediscovery loop fires at configured interval and hot-adds new repos via flag file
  • fleet_status() displays topology with repos grouped by machine
  • NO_GITHUB=1 causes discover to no-op with a warning
  • All 13 new test cases pass in sw-fleet-test.sh
  • All 22 existing test suites continue to pass (npm test)
  • No Bash 3.2 incompatibilities (no associative arrays, no readarray, no ${var,,})

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /