Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix(scheduler): derive K8s service_url default at runtime (closes #179)#181

Open
initializ-mk wants to merge 1 commit into
main from
fix/issue-179-k8s-scheduler-default-service-url
Open

fix(scheduler): derive K8s service_url default at runtime (closes #179) #181
initializ-mk wants to merge 1 commit into
main from
fix/issue-179-k8s-scheduler-default-service-url

Conversation

@initializ-mk

@initializ-mk initializ-mk commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #179. The runtime K8s scheduler backend was hard-erroring when `scheduler.kubernetes.service_url` was unset, even though the build-time `schedule_manifest_stage` already knew how to default the same field. This PR mirrors the build-time default in the runtime so an in-cluster agent without an explicit `service_url` comes up cleanly.

```

Before (in-cluster, no service_url in forge.yaml)

Error: kubernetes scheduler backend: scheduler.kubernetes.service_url is required

After

ServiceURL auto-derived to http://<agent_id>..svc:/
```

Changes

Code

  • `forge-cli/runtime/scheduler_k8s_backend.go`
    • `K8sBackendConfig` gains a `Port int` field (defaults to 8080 to match the runner's listen-port default).
    • New helper `defaultK8sServiceURL(agentID, namespace, port)` mirrors `forge-cli/build/schedule_manifest_stage.go:70-82`.
    • `NewKubernetesBackend` and `NewKubernetesBackendWithClient` both derive the default after namespace resolution; explicit `ServiceURL` still wins.
    • The `service_url is required` hard-error path is gone.
  • `forge-cli/runtime/runner.go` — `selectScheduleBackend` plumbs `r.cfg.Port` into the new `K8sBackendConfig.Port` field.

Tests

`forge-cli/runtime/scheduler_k8s_backend_test.go`:

Docs

  • `docs/deployment/scheduler-kubernetes.md` — new "service_url defaulting" subsection; YAML comment updated.
  • `docs/core-concepts/scheduling.md` — YAML comment updated with cross-link.
  • `CHANGELOG.md` — Unreleased / Fixed entry.

Test plan

  • `go test ./forge-cli/runtime/ -count=1` — full forge-cli/runtime suite passes (15.9s).
  • `go test ./forge-cli/build/ -run TestSchedule -count=1` — build-stage schedule tests still pass.
  • `go vet ./forge-cli/runtime/ ./forge-cli/build/` — clean.
  • `gofmt -l` on touched files — clean.
  • `golangci-lint run` on touched files — 0 issues.
  • Manual: deploy a sidecar agent to a cluster with no `scheduler.kubernetes.service_url` set — confirm it boots and `forge.<agent_id>..svc:/` is logged.

Risks

Low. The change only adds a fallback for a previously-fatal path. Operators with an explicit `service_url` (the supported configuration today) see no change in behavior — pinned by `TestKubernetesBackend_ServiceURLExplicitOverride`. Field-backed defaults match the build stage's behavior 1:1, so newly-derived URLs identical to what `forge package` would have written.

Pre-fix, the K8s scheduler backend's runtime constructor hard-errored
when scheduler.kubernetes.service_url was empty:
 Error: kubernetes scheduler backend: scheduler.kubernetes.service_url is required
...even though the build-time schedule-manifest stage at
forge-cli/build/schedule_manifest_stage.go already knew how to default
the same field to http://<agent_id>.<namespace>.svc:<port>/ when
unset. Two adjacent code paths reaching opposite conclusions for the
same missing field — operators who deployed in-cluster without an
explicit service_url couldn't start the agent.
Fix: mirror the build-time default in the runtime constructor.
 - K8sBackendConfig gains a Port int field; selectScheduleBackend
 plumbs r.cfg.Port into it.
 - NewKubernetesBackend (and the -WithClient test seam) derives
 http://<agent_id>.<namespace>.svc:<port>/ when cfg.ServiceURL is
 empty. defaultK8sServiceURL is the shared helper.
 - Port=0 falls back to 8080 (matches the runner's listen-port
 default at forge-cli/runtime/runner.go:152-153).
 - The hard-error branch is gone — there's no scenario in-cluster
 where we can't derive a sensible default.
 - Operator override semantics unchanged: an explicit service_url
 always wins. Pinned by TestKubernetesBackend_ServiceURLExplicitOverride.
Tests added:
 - TestKubernetesBackend_ServiceURLDefaultDerivation — the #179 pin:
 empty ServiceURL + Port=9090 → http://my-agent.ns-a.svc:9090/
 - TestKubernetesBackend_ServiceURLDefaultPortFallback — Port=0
 falls back to 8080
 - TestKubernetesBackend_ServiceURLExplicitOverride — explicit
 ServiceURL (e.g. https://gateway.example.com/...) passes through
 untouched
Docs:
 - docs/deployment/scheduler-kubernetes.md — new "service_url
 defaulting" subsection; YAML comment updated to note the
 auto-derivation
 - docs/core-concepts/scheduling.md — YAML comment updated with
 cross-link
 - CHANGELOG.md — Unreleased / Fixed entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Runtime K8s scheduler backend hard-errors when scheduler.kubernetes.service_url is unset (build stage auto-derives the same value)

1 participant

AltStyle によって変換されたページ (->オリジナル) /