-
Notifications
You must be signed in to change notification settings - Fork 6
Releases: initializ/forge
v0.15.0
Forge v0.15.0 — Kubernetes-native scheduling, end-to-end guardrails audit, multi-tenant stamping
Forge v0.15.0 ships three substantial additions to the open-source AI agent runtime: a hybrid scheduler that deploys agent cron jobs as native Kubernetes CronJob resources, a complete guardrails audit pipeline covering all five library gates with optional OpenTelemetry span parity, and tenant + entity identifiers stamped on every audit event so SIEM consumers can filter the export stream by org, workspace, and agent without joining against authentication rows.
Forge agents now deploy to Kubernetes with one command: forge build && forge package && kubectl apply -k. The scheduler's CronJob controller takes over timing, the cluster's etcd persists state, and the manifests forge package emits are byte-identical to what the runtime backend reconciles — no drift between build-time and runtime.
This release also publishes forge-core, forge-skills, and forge-plugins as importable Go modules under their own path-prefixed tags so platform teams can embed Forge's runtime, registry, and channel adapters directly.
Highlights
scheduler.Backendinterface with two implementations: a file-backed scheduler (the existing 30s ticker + markdown persistence, unchanged behavior) and a Kubernetes backend usingk8s.io/client-gofor nativeCronJobCRUD. Selected at startup viascheduler.backend: auto | file | kubernetesinforge.yaml. In-cluster detection probes the projected ServiceAccount token at/var/run/secrets/kubernetes.io/serviceaccount/token.forge packageKubernetes manifest generation. For everyforge.yaml schedules[]entry, the build pipeline emits onecronjob-<id>.yaml, a credential-less Secret template, and a Role/RoleBinding scoped to the agent's namespace. The Secret manifest deliberately ships without adatafield — the credential never lands in version control; operators populate viaforge auth secret-yaml | kubectl apply -f -, ExternalSecrets, Sealed Secrets, SOPS, or Vault Agent Injector.forge authsubcommand (show-token,mint-token,secret-yaml) for the internal bearer token the runtime mints at startup. Same token channel adapters use for A2A callbacks, now reused by Kubernetes schedulerCronJobtrigger pods.guardrail_checkaudit event emitted at all five library gates (input,context,tool_call,output,stream) with opt-in evidence capture controlled byFORGE_GUARDRAIL_CAPTURE_EVIDENCE. The PII/secret content the library produces is captured post-mask by default; redact-then-truncate uses the same regex set as the OpenTelemetry content-capture pipeline.- OpenTelemetry guardrail spans (
guardrail.<gate>) shipped symmetrically to the audit emission. Block decisions stamp OTelErrorstatus with the violation summary — operators see blocked invocations as red bars in the trace UI without writing custom attribute queries. - Tenancy stamping (
org_id,workspace_id) and entity stamping (entity_id,entity_type) added as top-level audit fields. Sourced fromFORGE_ORG_ID/FORGE_WORKSPACE_ID/FORGE_AGENT_ID(orforge.yaml'sagent_id) with per-requestX-Forge-Org-ID/X-Forge-Workspace-IDheaders overriding the tenancy stamp. Field names match the guardrails library's MongoDBGuardrailAuditEventcolumns 1:1. - OpenTelemetry span content capture opt-in (issue #130). Prompts and completions stamp
gen_ai.input.messages+gen_ai.output.messages(current OTel GenAI semantic conventions, not the deprecated flat-string keys); tool calls stampforge.tool.args+forge.tool.result. Capture is off by default; redaction is on by default when capture is enabled. - Go library modules published —
forge-core,forge-skills,forge-pluginsare now importable via path-prefixed tags (forge-core/v0.15.0, etc.). The Initializ platform and other embedders can pull just the runtime, registry, or channel layers without the CLI.
What's new
Kubernetes-native scheduler (#162)
The default auto backend resolves to file when running on a laptop or CI and to Kubernetes when the agent is running as a pod. The K8s backend persists schedules as CronJob resources keyed on forge.agent.id / forge.schedule.id labels; the cluster's controller handles timing, etcd provides durability, and concurrencyPolicy: Forbid enforces overlap prevention natively.
# forge.yaml schedules: - id: daily-summary cron: "0 9 * * *" task: "Generate the daily ops summary and post to #ops" channel: slack channel_target: "C12345" scheduler: backend: auto kubernetes: namespace: "" # defaults to the agent pod's own namespace service_url: "" # CronJob trigger pods POST here (in-cluster Service DNS) allow_dynamic: false # whether the LLM (schedule_set) can create CronJobs trigger_image: "" # default: curlimages/curl:8.10.1 auth_secret_name: "" # default: <agent_id>-internal-token
End-to-end deploy:
forge build && forge package # emits k8s/cronjob-*.yaml + Secret template + RBAC forge auth mint-token > /dev/null # first-deploy bootstrap forge auth secret-yaml | kubectl apply -f - # populates the Secret out-of-band kubectl apply -k .forge-output/k8s/ # brings up the agent + CronJobs
Sync() reconciles cluster CronJob resources against the declared yaml entries: creates missing, updates on spec drift, prunes dropped yaml entries, and preserves LLM-sourced (forge.schedule.source=llm) entries unconditionally. AllowDynamic: false (the default) keeps the LLM from creating CronJob resources at runtime; forge package materializes the yaml entries at build time instead. The RBAC Role forge package emits grants get / list / watch by default, with create / update / delete only when allow_dynamic: true.
Reference: docs/deployment/scheduler-kubernetes.md, docs/core-concepts/scheduling.md.
Guardrails audit emission and OTel span parity (#155, #159, #161)
Every mask / block / warn decision now emits a guardrail_check audit event on the configured sink (stderr, Unix socket, HTTP fallback) carrying the library gate, decision, guardrail, category, violation_count, and optional tool. With FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true, the post-mask content lands in fields.evidence (or the pre-mask original on block / warn), scrubbed through a vendor-secret redact pass with a 4 KiB byte cap.
OpenTelemetry receives the same information as guardrail.<gate> child spans with forge.guardrail.gate / decision / type / category / violation_count attributes. Block decisions stamp OTel Error status with the violation summary — operators get red bars in the trace UI without custom attribute queries.
The direction field shipped in earlier drafts is replaced by gate (sourced from Result.Gate), matching the library vocabulary 1:1.
{
"ts": "2026年06月15日T10:00:00Z",
"event": "guardrail_check",
"schema_version": "1.0",
"correlation_id": "fd111edd27c20101",
"task_id": "slack-...",
"fields": {
"gate": "input",
"decision": "masked",
"guardrail": "pii",
"category": "ssn",
"violation_count": 1
}
}Reference: docs/security/guardrails.md, docs/security/audit-logging.md.
Tenancy and entity stamping (#157, #164)
Audit events now carry top-level org_id, workspace_id, entity_id, and entity_type fields. SIEM consumers filter the audit stream by (org_id, workspace_id, entity_id) as a complete deploy identifier without joining against auth_verify rows. Field names match the guardrails library's MongoDB GuardrailAuditEvent columns 1:1, so consumers reading both streams join on the same column names.
| Field | Source | Per-request header override |
|---|---|---|
org_id |
FORGE_ORG_ID |
X-Forge-Org-ID |
workspace_id |
FORGE_WORKSPACE_ID |
X-Forge-Workspace-ID |
entity_id |
FORGE_AGENT_ID or forge.yaml agent_id |
— |
entity_type |
hardcoded "agent" (future-extensible to workflow / assistant) |
— |
Reference: docs/security/tenancy.md.
OpenTelemetry content capture and current GenAI semantic conventions (#130)
The observability.tracing.capture_content and observability.tracing.redact config knobs are now wired through Phase 3 span instrumentation. With capture on, the executor stamps gen_ai.input.messages (structured JSON array per current OTel GenAI semconv, superseding the deprecated gen_ai.prompt) and gen_ai.output.messages on llm.completion spans, plus forge.tool.args and forge.tool.result on tool.<name> spans.
Captured values pass through a vendor-secret redactor (Anthropic, OpenAI, GitHub, AWS, Slack, RSA/EC/OpenSSH/Private keys, Telegram bot tokens) and a 4 KiB byte cap with a ...[truncated:N] marker byte-identical to the audit payload-capture marker — operators grepping [truncated: across spans and audit rows see aligned output.
Library modules
forge-core, forge-skills, and forge-plugins are now importable Go modules:
import ( "github.com/initializ/forge/forge-core/runtime" "github.com/initializ/forge/forge-skills/parser" "github.com/initializ/forge/forge-plugins/channels/slack" )
Tagged independently via path-prefixed tags: forge-core/v0.15.0, `forge-skill...
Assets 9
v0.14.1
Forge v0.14.1 — Bug fixes for OpenAI-compatible providers (Together.ai, OpenRouter, Groq) and session persistence
Published: 2026年06月09日 · Tag: v0.14.1 · Previous release: v0.14.0 (2026年06月09日, OpenTelemetry Tracing v1)
Forge v0.14.1 is a patch release focused on operators running agents against OpenAI-compatible LLM providers — Together.ai, OpenRouter, Groq, Fireworks, Anyscale, vLLM, llama.cpp's server, and others — plus a class of session-persistence bugs that broke multi-turn channel conversations on OpenAI reasoning models (gpt-5-nano, o1, o3) and strict OpenAI-compatible gateways.
Seven bug fixes, no new features, no breaking changes. Backward-compatible upgrade from v0.14.0. 18 files changed, +1,683 / −38 lines, 7 issues closed, 7 PRs merged.
If you saw "something went wrong while processing your request, please try again" on Slack thread followups, Anthropic API returned status 401 despite configuring OpenAI, Unsupported parameter: 'max_tokens', or NetworkPolicy blocking calls to api.together.ai after forge package — this release fixes all of them.
The shipped fixes
1. Session persistence no longer poisons followup turns
Long-running Slack threads (and any session-persistent channel) succeeded on the first message and then failed every followup with "something went wrong while processing your request, please try again". The executor unconditionally wrote the LLM's assistant message to memory before checking content. When the provider hit finish_reason: length (output token cap) and returned an assistant turn with empty content AND no tool_calls, that invalid-per-OpenAI-spec shape landed in memory. The in-loop empty-response recovery papered over it for the current task, but persistSession wrote the polluted memory to .forge/sessions/<task_id>.json. The next request recovered the bad shape and strict OpenAI-spec providers (Moonshot, hosted OpenRouter, OpenAI strict mode) returned HTTP 400.
Fix:
- Substitute a placeholder content string (
"(continuing — previous response was truncated by output token limit)") when the LLM returns the bad shape, so newly-written sessions never contain it. - Extend
sanitizeMessagesonLoadFromStorewith a newstripEmptyAssistantTurnspass that rescues sessions already on disk — no manualrm <agent-dir>/.forge/sessions/<task-id>.jsonrequired after upgrade.
2. Duplicate user message at session start no longer trips strict-mode providers
Same surface symptom as #131 — "something went wrong" on Slack thread followups — different root cause. The runner pre-appended params.Message to task.History before calling Execute so SSE clients could observe the inbound message in the in-flight task. The executor's !recovered first-interaction path then iterated task.History AND appended *msg separately, producing two consecutive identical user turns at the start of every fresh conversation. OpenAI reasoning models (gpt-5-nano, o1, o3) and strict OpenAI-compatible gateways (Together's Kimi) reject consecutive same-role messages with HTTP 400.
Fix:
- Strip the trailing
task.Historyentry when it equals*msgbefore iterating (newa2aMessagesEqualhelper). - Extend
sanitizeMessageswith acollapseConsecutiveDuplicatespass onLoadFromStore. Surgical by design — only EXACT same-role + same-content + tool-call-free pairs collapse; workflow nudges ("Your response was empty...") and tool-bearing assistant turns survive.
3. code-review skill no longer routes Anthropic-first when both API keys are set
The skill's code-review-diff.sh and code-review-file.sh picked Anthropic whenever ANTHROPIC_API_KEY was non-empty, even when the operator's forge.yaml pointed at an OpenAI-compatible provider (Together.ai, OpenRouter, Groq, Fireworks, Anyscale, vLLM) via OPENAI_BASE_URL and REVIEW_MODEL was clearly a non-Anthropic model. Operators with a stale ANTHROPIC_API_KEY co-resident with a live OPENAI_API_KEY in .forge/secrets.enc got Anthropic API returned status 401 and assumed the skill was broken.
Fix: new REVIEW_PROVIDER env var (values anthropic or openai) that wins always. When unset, auto-detected from REVIEW_MODEL prefix (claude-* or anthropic/* → Anthropic; anything else → OpenAI), then by sole API key, then defaults to OpenAI when both keys exist with no other signal.
The same PR also fixes a separate bug: the OPENAI_BASE_URL-as-Responses-API-toggle was exactly backwards — Together, OpenRouter, Groq, Fireworks, Anyscale all implement /chat/completions only, not OpenAI's proprietary /responses. Setting OPENAI_BASE_URL=https://api.together.ai/v1 silently POSTed to https://api.together.ai/v1/responses → 404. Decoupled into a separate OPENAI_USE_RESPONSES_API=1 opt-in for the Codex/OAuth flow.
4. code-review skill uses max_completion_tokens (not deprecated max_tokens)
OpenAI deprecated max_tokens in favor of max_completion_tokens for Chat Completions. Reasoning models (o1, o1-preview, o3, gpt-5) and strict OpenAI-compatible providers (Together.ai's Kimi-K2.6 series, Moonshot) reject the legacy field with HTTP 400:
"Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead."
Fix: one-token edit per script. The Anthropic branch keeps max_tokens (correct field for Anthropic's API). The Responses API branch uses no max-tokens field (auto-sizes).
5. Skill subprocesses inherit OPENAI_BASE_URL and the other standard SDK base-URL env vars
SkillCommandExecutor built a whitelist-only env for skill subprocesses where OPENAI_ORG_ID was always-passed but the standard SDK base-URL pointers (OPENAI_BASE_URL, ANTHROPIC_BASE_URL, OLLAMA_BASE_URL, GEMINI_BASE_URL) were not — unless each SKILL.md author remembered to declare each variable in env.optional. Every LLM-calling skill that forgot silently broke for OpenAI-compatible deployments. The operator's main agent loop hit Together.ai correctly; the same agent's skill subprocess hit api.openai.com with the Together.ai key and 401d.
Fix: four standard SDK variables special-cased alongside OPENAI_ORG_ID. Every skill that uses industry-standard env conventions just works — present, future, in-tree, and third-party.
6. CLI wizard recognizes encrypted one_of keys and stops writing first-key placeholders
forge skills add did not validate one_of groups at all — operators got no confirmation that their encrypted key (e.g. OPENAI_API_KEY in .forge/secrets.enc) was detected. forge init's fallback wrote opts.EnvVars[OneOfEnv[0]] = "" (the first key in the list — ANTHROPIC_API_KEY for code-review), producing an empty .env line that misled operators into thinking they needed an Anthropic key when they had configured OpenAI.
Fix:
forge skills addnow mirrorsRequiredEnv's three-source check forone_ofgroups:os.Getenv/.env/loadSecretPlaceholders. Output now showsOPENAI_API_KEY (secrets) — okregardless of list order.forge initno longer pre-writes a placeholder. The runtime resolver already surfaces missingone_ofgroups clearly atforge runif neither key is set..envwriter no longer emits empty placeholder lines for everyone_ofmember — only keys with non-empty values get written.
7. forge package and forge run auto-add the LLM provider's custom base URL to the egress allowlist
Operators configuring an OpenAI-compatible provider via OPENAI_BASE_URL=https://api.together.ai/v1 shipped a NetworkPolicy that blocked the provider's hostname — deployed agents 401d or timed out depending on which side noticed first. Same trap Phase 6 of OTel Tracing v1 fixed for the OTLP collector (#107), but for the LLM provider.
Fix: added ModelRef.BaseURL and ModelFallback.BaseURL fields to the forge.yaml schema, plus two new helpers:
security.LLMProviderDomains(cfg)— extracts hostnames fromcfg.Model.BaseURL+cfg.Model.Fallbacks[].BaseURL. Cfg-driven; used by both build and runtime.security.LLMProviderEnvDomains(envVars)— extracts hostnames from the four canonical SDK base-URL env vars in the resolved env. Runtime safety net for deployments that haven't migrated to the schema field.
Both wired into egress_stage.go and runner.go alongside the existing AuthDomains / MCPDomains / OTelDomain merges. Operators going forward declare the URL once in forge.yaml:
model: provider: openai name: moonshotai/Kimi-K2.6 base_url: https://api.together.ai/v1
...and forge build && forge package && kubectl apply produces a NetworkPolicy that admits api.together.ai automatically.
Compatibility-restored deployment matrix
| Setup | Pre-v0.14.1 | Post-v0.14.1 |
|---|---|---|
Agent on Together.ai / OpenRouter / Groq with OPENAI_BASE_URL set |
S... |
Assets 9
v0.14.0
Forge v0.14.0 — End-to-end OpenTelemetry distributed tracing for AI agents
Published: 2026年06月09日 · Tag: v0.14.0 · Previous release: v0.13.0 (2026年06月07日)
Forge v0.14.0 ships OpenTelemetry Tracing v1 — end-to-end distributed tracing for AI agents covering A2A dispatch, the executor loop, every LLM completion, every tool call, and every outbound HTTP request. Traces export over OTLP to Tempo, Jaeger, Honeycomb, Datadog, Grafana Cloud, or any other OTLP-compatible backend. Multi-hop A2A flows display as one connected trace. Audit events carry the active span's trace_id and span_id so operators pivot from compliance row to performance trace with a single copy-paste. The OTLP collector hostname auto-injects into the build's egress allowlist so Kubernetes deployments need no second NetworkPolicy edit.
This release closes initiative #108 across seven phases (one PR per phase, #122–#128), plus a bug-2 fix for A2A tasks/send message validation and the docs sync across the operator-facing surface.
45 files changed, +5,494 / −42 lines, 8 issues closed, 10 PRs merged since v0.13.0. Full per-PR detail in CHANGELOG.md.
OpenTelemetry tracing — what it covers
When observability.tracing.enabled: true is set in forge.yaml, Forge produces one OTLP span tree per inbound A2A request:
a2a.<method> [SpanKindServer; dispatcher]
└── agent.execute [outer loop; root for the task]
├── llm.completion (×ばつ N turns) [per LLM provider call — Anthropic / OpenAI / Ollama]
│ └── http.client (×ばつ outbound) [auto via otelhttp on egress transport]
└── tool.<tool_name> (×ばつ M calls) [per tool invocation]
└── http.client (if HTTP)
Every llm.completion carries OpenTelemetry GenAI semantic-convention attributes: gen_ai.system, gen_ai.request.model, gen_ai.response.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.finish_reasons. Cost-attribution dashboards in any OTLP backend (Tempo + Grafana, Jaeger, Honeycomb, Datadog APM, Grafana Cloud) light up immediately.
Forge-specific attributes use the forge.* namespace: forge.task.id, forge.task.final_state (completed / failed / canceled), forge.tool.name, forge.workflow.id / .stage.id / .step.id (from FWS-2 X-Workflow-* headers), forge.correlation_id, forge.loop.iteration.
How to enable
In forge.yaml:
observability: tracing: enabled: true endpoint: https://otel-collector.monitoring.svc.cluster.local:4318/v1/traces sampler: parentbased_always_on
Or via standard OTEL_* environment variables:
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318/v1/traces export OTEL_TRACES_SAMPLER=always_on forge run --otel-enabled
Or per-command via nine new --otel-* CLI flags on forge run and forge serve start:
forge run --otel-enabled \ --otel-endpoint http://localhost:4318/v1/traces \ --otel-sampler always_on \ --otel-service-name my-agent-staging
Resolution precedence: CLI flags > OTEL_* env vars > observability.tracing in forge.yaml > defaults. A set-but-empty env var does not wipe a non-empty yaml value (absence-of-value is "no override," not "unset").
All 10 standard OTEL_* SDK env vars are honored: OTEL_SDK_DISABLED, OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_PROTOCOL, OTEL_EXPORTER_OTLP_HEADERS, OTEL_EXPORTER_OTLP_TIMEOUT, OTEL_SERVICE_NAME, OTEL_RESOURCE_ATTRIBUTES, OTEL_TRACES_SAMPLER, OTEL_TRACES_SAMPLER_ARG.
End-to-end propagation across multi-hop A2A flows
Forge installs the W3C tracecontext + baggage composite propagator on the OpenTelemetry global at startup. The JSON-RPC dispatcher extracts inbound traceparent headers before opening its own span, and outbound HTTP through the egress-enforced transport re-injects traceparent automatically via otelhttp. Multi-hop flows — orchestrator → agent A → agent B → external API — show as one connected trace in the backend:
orchestrator
│ traceparent: 00-T-S1-01
▼
┌───────────────┐
│ forge agent │ span_id=S2, parent=S1 (a2a.tasks/send)
│ ▼ │ span_id=S3, parent=S2 (agent.execute)
│ ▼ │ span_id=S4, parent=S3 (llm.completion)
└──────│────────┘
▼ traceparent: 00-T-S2-01 ← otelhttp re-injects on outbound
via the egress-enforced transport
┌───────────────┐
│ downstream │ span_id=S5, parent=S2 (a2a.tasks/send)
│ forge agent │
└───────────────┘
All spans share trace_id = T and chain by parent_span_id. The operator sees one flame graph across every hop.
baggage (the other half of the composite propagator) flows through to the handler context so application-level identifiers — tenant_id, user_id, A/B-test bucket — travel with the trace.
Audit ↔ trace cross-link
Forge has shipped a structured audit pipeline since FWS-1 (NDJSON to stderr + opt-in Unix Domain Socket sink). As of v0.14.0, every audit event emitted from a request-scoped context carries the active span's trace_id and span_id:
{
"event": "llm_call",
"task_id": "t-1234",
"trace_id": "4a8f95a0e1bedda42c9dd5350fb3b33a",
"span_id": "ad8b2c91e44f0a72",
"model": "claude-sonnet-4",
"input_tokens": 1240,
"output_tokens": 187,
"duration_ms": 2310
}Two new pivot directions for operators:
- Audit row → trace: paste the
trace_idinto Tempo / Jaeger / Honeycomb to land on the matching trace; paste thespan_idto jump directly to thellm.completionchild carrying the matchinggen_ai.usage.*tokens. - Trace → audit row: copy the
trace_idfrom a trace browser, grep the audit log for the corresponding row, get the FWS-8 payload metadata the trace doesn't carry.
Format: lowercase hex matching W3C traceparent semantics — 32-char (128-bit) trace_id, 16-char (64-bit) span_id.
Backward compatible by construction: both fields use omitempty. When tracing is disabled (the default), audit JSON is byte-identical to pre-v0.14.0. The audit schema version ("1.0") is not bumped — the documented policy treats adding optional fields as schema-compatible.
Egress-enforced OTLP transport + build-time auto-merge
The OTLP HTTP exporter rides through the same egress enforcer every other in-process Forge HTTP client uses. The operator's egress allowlist bounds where Forge can send spans — a misconfigured collector URL cannot exfiltrate span content to an unapproved destination. The post-DNS IP guard applies to OTLP too.
No second egress edit needed. forge package and forge run both extract the OTLP endpoint hostname and auto-inject it into egress_allowlist.json. The Kubernetes NetworkPolicy generated by forge package therefore admits OTLP traffic automatically:
# forge.yaml — this is sufficient. The collector is added to the allowlist automatically. egress: mode: allowlist allowed_domains: - api.anthropic.com # operator-declared observability: tracing: enabled: true endpoint: https://otel-collector.monitoring.svc.cluster.local:4318/v1/traces # generated egress_allowlist.json: # all_domains = [api.anthropic.com, otel-collector.monitoring.svc.cluster.local]
Disabled tracing produces no allowlist entry — turning tracing off in yaml does not leave a stale entry punched through the NetworkPolicy.
Tracing-off safety: nothing crashes the agent
Off by default per the initiative ruling. When tracing is off, enabled: false, or endpoint is empty:
coreruntime.Tracer()returns a non-recording no-op tracer; spans cost near zero (one interface dispatch).EmitFromContextdoes not stamptrace_id/span_id; audit JSON is byte-identical to pre-Phase-4.OTelDomainreturns nil; no entry inegress_allowlist.json.observability.WrapHTTPTransportis a near pass-through.
Telemetry failures never crash the agent. A misconfigured endpoint, a malformed traceparent header, an unreachable collector — every failure mode falls through to the no-op tracer with a warning in the ops log. The CLI resolver is the single place that fails loudly on bad config at startup.
Also in this release
A2A message validation at tasks/send entry points (#119 / PR #121). The most common failure was a client sending the pre-A2A-0.3.0 "type": "text" discriminator instead of the spec-correct "kind": "text" — encoding/json silently dropped the unknown field, the executor produced a confused reply ("It looks like your message didn't come through"). Now rejected at the dispatcher with a clear InvalidParams error naming the spec divergence: "parts[0]: part kind is required (A2A 0.3.0); got empty kind with non-empty content — did you send \"type\" instead of \"kind\"? \"type\" is from the pre-0.3.0 dialect and is silently ignored by the decoder." Sentinel errors (ErrPartKindMissing, ErrPartKindUnknown, ErrMessageRoleMissing, ErrMessagePartsEmpty) exposed for callers that branch on the cause.
FWS-3 X-Forge-* response headers stamped on JSON-RPC tasks/send (PR #120). Closes a bug where the workflow correlation and usage headers were present on the REST endpoint but missing from the JSON-RPC tasks/send path.
**Documentation sync across the operator-facing ...
Assets 9
v0.13.0
Forge v0.13.0 — A2A 0.3.0 conformance, audit pipeline hardening, and configurable platform policy
Published: 2026年06月07日
Forge v0.13.0 is a feature release covering every commit merged to main
since v0.12.0
on 2026年05月25日. 119 files changed, +16,324 / −566 lines,
13 issues closed, 10 numbered workstreams (FWS-1 through FWS-10)
plus the new Claude knowledge skills.
If you operate Forge in production, this release lands the
verifiable version of the security and observability claims the
project has made since v0.10: an A2A 0.3.0 spec-conformant Agent Card,
a documented schema-versioned audit contract with monotonic sequence
numbers and a metadata-only invariant pinned by regression tests, a
dedicated Unix Domain Socket sink for audit export, a three-layer
platform policy (system / user / workspace) with first-match-wins
attribution, configurable per-IP rate limiting with sensible
orchestration-friendly defaults, and a one-line stream split so SIEM
pipelines can route ops logs and audit NDJSON without parsing
payloads.
The full per-PR detail lives in CHANGELOG.md.
Highlights
- A2A 0.3.0 spec-conformant Agent Card at the canonical
/.well-known/agent-card.jsonpath. Every required field populated
fromforge.yamlandSKILL.md. Legacy/.well-known/agent.json
alias preserved with aDeprecation: trueheader per RFC 8594. - Hardened audit emission: every event carries
schema_version: "1.0"and a monotonic per-invocationseqso
consumers detect gaps and reordering by grouping
(correlation_id, task_id). The default audit posture is
metadata-only (token counts, sizes, durations — never raw prompt or
completion text or tool args/results), and aTestNoPayloadByDefault_LLMCall
regression test pins the invariant. - Audit event export via Unix Domain Socket sink (preferred) or
localhost HTTP fallback. Fire-and-forget, 50ms per-event timeout,
exponential reconnect backoff, periodicaudit_export_statusevent
every 60s carrying per-sink health. - Three-layer platform policy. The single-layer
FORGE_PLATFORM_POLICYenv from v0.12 is now joined by a system
layer (/etc/forge/policy.yaml, sysadmin) and a user layer
(~/.forge/policy.yaml, developer). Deny lists union; max-bounds
use smallest non-zero; audit attribution credits the first layer
to deny. - Channel deny is first-class. A new
denied_channelsfield in
the platform policy schema skips named adapters at startup with a
channel_denied_by_policyaudit event.forge channel disable/
enableedit the user layer;--systemretargets to the system
layer. - Per-IP rate limit is configurable end-to-end via
server.rate_limit:inforge.yaml, matching--rate-limit-*
CLI flags, andFORGE_RATE_LIMIT_*env vars. Resolution order:
CLI > env > yaml > defaults. Defaults bumped for orchestrated
workloads — write side now 60/min sustained with burst 20 (was
10/min, burst 3).tasks/cancelis exempt from the write bucket
by default so a cost-ceiling cancel burst never throttles the
recovery mechanism. - Workflow correlation —
X-Workflow-ID,X-Workflow-Stage-ID,
X-Workflow-Step-ID, andX-Invocation-Callerheaders are extracted
on every request and stamped on every audit event for multi-agent
flow tracing. - Token usage in audit and response headers. Every
llm_call
event carriesinput_tokens/output_tokens/duration_ms/
model/provider; every A2A response carries
X-Forge-Tokens-In/Out,X-Forge-Duration-Ms,X-Forge-Model,
X-Forge-Providerfor cost-enforcement pipelines. tasks/cancelactually cancels. The JSON-RPCtasks/cancel
method now signals an in-flight invocation via
context.CancelCauseFuncwith a typed reason. The agent loop
honors cancellation at iteration and tool-call boundaries. An
invocation_cancelledaudit event closes the cancelled invocation
with the reason + partial token totals.- Ops logs to stdout, audit to stderr. Container log collectors
and SIEM pipelines can now split the two at the stream level —
no payload parsing. - New Claude-side reference:
.claude/skills/forge.mdships a
single self-contained knowledge file for Forge, and
.claude/skills/forge-skill-builder.mdports the Web UI Skill
Builder's prompt for use in any Claude chat. Thesync-docs
workflow keeps both current with the code.
What's new
1. A2A 0.3.0 Agent Card conformance (FWS-1, #85)
Forge's Agent Card now conforms to the A2A 0.3.0 spec end-to-end:
canonical path, required-field shape, security schemes derived from
the configured auth chain, and SKILL.md skills bridged into the
published card at both forge build and forge run time.
| What | Where |
|---|---|
| Canonical path | GET /.well-known/agent-card.json |
| Legacy alias | GET /.well-known/agent.json — same body + Deprecation: true + Link: rel=successor-version per RFC 8594. Removable one release after this ships. |
| Required fields | name, description, url, version (from forge.yaml), protocolVersion: "0.3.0", defaultInputModes, defaultOutputModes, skills[], capabilities, securitySchemes, security |
| Skills mapping | SKILL.md frontmatter (name, description, category, tags) → A2A AgentSkill objects with deterministic ID-sorted output so card bytes are stable across restarts |
| Security schemes | Derived from auth.providers — oidc/azure_ad → openIdConnect with discovery URL; static_token/http_verifier → http+bearer; gcp_iap → apiKey in header; aws_sigv4 → http+bearer with bearerFormat: "forge-aws-v1" |
| New audit event | agent_card_published at startup + hot-reload, carrying name, version, protocol_version, url, skill_count, capabilities, security_schemes, card_size_bytes, card_sha256 — consumers detect config drift across deploys |
See docs/reference/a2a-agent-card.md.
2. Workflow correlation ID threading (FWS-2, #86)
Orchestrators (initializ Command, custom controllers, GitOps tooling)
that fire multi-agent workflows now get end-to-end audit traceability
without any per-agent code change. Inbound requests carrying
X-Workflow-ID / X-Workflow-Stage-ID / X-Workflow-Step-ID /
X-Invocation-Caller headers have those values extracted into the
request context, threaded through the agent loop, and stamped on
every audit event the invocation produces — invocation_complete,
llm_call, tool_exec, etc.
The emitted JSON is byte-for-byte identical to the pre-FWS-2 shape
for direct (non-orchestrated) calls, so consumers reading the v0.12
audit stream continue to work unchanged.
See docs/security/workflow-correlation.md.
3. Token usage + execution duration in audit (FWS-3, #87)
Every llm_call audit event now carries input_tokens,
output_tokens, model, provider, duration_ms, and request_id
captured directly from provider response metadata (Anthropic, OpenAI,
Ollama via the OpenAI-compatible path, OpenAI Responses). Field naming
aligns with OTel GenAI semantic conventions
(gen_ai.usage.input_tokens / gen_ai.usage.output_tokens) so audit
consumers correlate to OTel traces without a translation table.
When a provider returns no usage (some self-hosted Ollama setups),
the event flags tokens_unavailable: true rather than silent zeros.
Each tool_exec event gains duration_ms plus structured arg-shape
metadata (args_size, result_size). Raw arg values are not emitted
(that's FWS-8's payload-stripping invariant, not FWS-3's).
A new invocation_complete audit event closes every A2A invocation
with the wall-clock duration and aggregated input_tokens_total /
output_tokens_total / llm_call_count. Response headers
X-Forge-Tokens-In, X-Forge-Tokens-Out, X-Forge-Duration-Ms,
X-Forge-Model, X-Forge-Provider carry the same totals so
cost-enforcement layers act without parsing the body.
4. Cancellation signal handling (FWS-4, #88)
The A2A tasks/cancel JSON-RPC method now actually cancels in-flight
invocations instead of merely flipping the stored task state. A
per-Runner CancellationRegistry tracks every active invocation by
task ID; the cancel handler signals the registered
context.CancelCauseFunc with a typed reason
(workflow_failure / cost_limit_exceeded / timeout /
external_signal).
The agent loop honors cancellation at the iteration boundary and
between tool calls within an iteration, so cancellation latency is
bounded by the current LLM call or tool exec.
A new invocation_cancelled audit event closes every cancelled
invocation with the classified reason, duration_ms up to
cancellation, and partial token totals consumed before the signal.
The A2A response carries state canceled plus a cancelled: <reason>
message so the orchestrator can react. Cancel-after-complete is
idempotent.
5. Three-layer platform policy + channel scope (FWS-5 / FWS-6, #89 + #90)
The single-layer FORGE_PLATFORM_POLICY env from v0.12.0 is now
joined by two more layers so different operators with different
trust contexts can each express their bounds independently.
| Layer | Path | Set by |
|---|---|---|
| system | /etc/forge/policy.yaml (override FORGE_SYSTEM_POLICY) |
Sysadmin (MDM, corporate image, Ansible) |
| user | ~/.forge/policy.yaml |
Developer (via forge channel disable or Web UI chip) |
| workspace | path at FORGE_PLATFORM_POLICY |
Op... |
Assets 9
v0.12.0
Forge v0.12.0 — Cloud-native auth providers, Microsoft Teams channel, and Model Context Protocol (MCP) HTTP client
This is a feature release covering work merged to main since v0.10.1. It lands three substantial capability tracks: a pluggable authentication chain with cloud-native providers for AWS IAM, Microsoft Entra ID, and Google Cloud IAP; a Microsoft Teams channel adapter; and Phase 1 of Model Context Protocol (MCP) client support.
If you operate Forge on Kubernetes, this release removes the need to stand up a parallel OIDC issuer in front of your agent and gives you a structured way to integrate hosted MCP servers without writing per-tool adapter code.
The full per-PR detail lives in CHANGELOG.md.
Highlights
- Six auth providers ship in-box:
static_token,oidc,http_verifier,aws_sigv4,gcp_iap,azure_ad. Pluggable chain — first-match-wins, fail-closed on transient errors, structuredauth_verify/auth_failaudit events with stable reason codes. - AWS IAM authentication via STS pre-signed URLs — the same pattern
aws-iam-authenticatoruses for EKS. Noaws-sdk-go-v2dependency. - Microsoft Entra ID (Azure AD) with tenant lock-in, optional Microsoft Graph group enrichment, and an explicit
allowed_tenantsallow-list for multi-tenant deployments. - GCP Identity-Aware Proxy support for Forge instances behind a GCP HTTPS Load Balancer with IAP enabled.
- Microsoft Teams channel adapter built on Microsoft Graph polling, with inline device-code OAuth in the init TUI.
- MCP HTTP client (Phase 1) — declare MCP servers in
forge.yaml; discovered tools register as namespaced<server>__<tool>first-class tools with per-call audit, OAuth 2.1 PKCE, and per-server lifecycle. - Three new embedded skills:
linear,code-plan, and ticket-driven variants ofcode-agentandgithub. - Web UI authentication wizard and matching non-interactive
forge initflags for every new provider.
What's new
1. Pluggable authentication chain with cloud-native providers
Forge's authentication is now a chain of Provider implementations registered via a database/sql-style factory pattern. The chain is configured under a new auth: block in forge.yaml; existing --auth-url deployments continue to work via backward-compatible flag mapping.
Built-in providers:
| Provider | Token format | Use case |
|---|---|---|
static_token |
Authorization: Bearer <static-token> |
Shared secret, dev/test |
http_verifier |
Bearer; verifier endpoint returns identity | Drop-in for the old --auth-url flow |
oidc |
Bearer JWT | Any OIDC issuer (Okta, Auth0, Cognito, Keycloak) |
aws_sigv4 |
Authorization: Bearer forge-aws-v1.<base64-of-presigned-sts-url> |
AWS IAM callers (Lambda, EC2, EKS, IRSA) |
gcp_iap |
X-Goog-Iap-Jwt-Assertion: <jwt> |
Forge behind GCP LB + IAP |
azure_ad |
Authorization: Bearer <aad-jwt> |
Microsoft Entra ID (humans, CI, Azure workloads) |
Putting Forge behind any of the three major cloud identity providers previously required standing up a parallel OIDC issuer (Cognito for AWS, Workspace SAML, etc.). The Phase 2 providers remove that requirement — operators authenticate to Forge using the identity they already have in their cloud, and Forge never holds the IdP's secrets.
AWS IAM via STS pre-signed URLs
aws_sigv4 follows the aws-iam-authenticator pattern used by EKS. The caller's AWS SDK mints a pre-signed GetCallerIdentity URL, base64-encodes it, and sends it as a Bearer token. Forge validates the URL host (SSRF guard), GETs it, and stamps the canonical caller ARN onto the request.
Client-side, in any AWS-SDK-bearing language:
import boto3, base64, requests from botocore.auth import SigV4QueryAuth from botocore.awsrequest import AWSRequest creds = boto3.Session().get_credentials().get_frozen_credentials() req = AWSRequest(method="GET", url="https://sts.us-east-1.amazonaws.com/?Action=GetCallerIdentity&Version=2011年06月15日") SigV4QueryAuth(creds, "sts", "us-east-1", expires=900).add_auth(req) token = "forge-aws-v1." + base64.urlsafe_b64encode(req.url.encode()).rstrip(b"=").decode() requests.post(forge_url, headers={"Authorization": f"Bearer {token}"}, data=msg)
A reference client lives at scripts/forge-aws-sign.py.
forge.yaml:
auth: providers: - type: aws_sigv4 region: us-east-1 allowed_principals: - "arn:aws:iam::ACCOUNT:role/forge-ci" # or, for a whole account: allowed_accounts: - "ACCOUNT" # expands to user/* role/* assumed-role/*/* federated-user/*
Microsoft Entra ID (Azure AD)
azure_ad composes the OIDC provider for signature verification and layers AAD-specific concerns: tenant lock-in via the tid claim, optional Microsoft Graph group enrichment for users with many group memberships (AAD's "groups overage" case), and an allowed_tenants allow-list for multi-tenant deployments.
auth: providers: - type: azure_ad audience: "api://forge" tenant_id: "11111111-2222-3333-4444-555555555555" groups_mode: graph # or "claim"
Multi-tenant requires explicit opt-in (allow_multi_tenant: true). When combined with an empty allowed_tenants, the configuration accepts tokens from any Entra tenant — startup logs a warning so the trade-off is loud.
GCP Identity-Aware Proxy
gcp_iap consumes the X-Goog-Iap-Jwt-Assertion header that GCP's IAP forwards to your backend. The IAP issuer and JWKS URL are hardcoded per Google's stable contract.
auth: providers: - type: gcp_iap audience: "/projects/PROJECT_NUMBER/global/backendServices/BACKEND_ID"
Security posture
- Algorithm whitelist before key lookup on every JWT path (RS/PS/ES variants only; HMAC and
nonerejected). - JWKS cache with TTL refresh, exponential backoff, and stale-grace fallback.
- All outbound auth-provider HTTP clients pin
CheckRedirectto refuse 3xx so the parser-side host gate is not silently undone by a redirect. aws_sigv4parser rejects URLs with userinfo and enforcesX-Amz-Datefreshness plusX-Amz-Expirescap before any STS call.- AAD
SkipIssuerCheckfield is unreachable from YAML (yaml:"-") — only settable from Go byazure_aditself.
2. Microsoft Teams channel adapter
A new Teams channel adapter polls Microsoft Graph for messages in configured chats and dispatches them to your agent. Inline device-code OAuth in the forge init TUI auto-opens the browser; tokens persist via the existing encrypted credential store.
channels: - type: msteams chat_ids: - "19:abc123...@thread.v2"
CLI helper: forge channel msteams-login runs the device-code flow standalone. Recent chat history is prepended to each dispatched event so the agent has context for the conversation it's joining.
3. Model Context Protocol (MCP) HTTP client — Phase 1
Forge agents can now consume MCP servers as first-class tool sources. Configure servers under a new top-level mcp: block; discovered tools register as namespaced <server>__<tool> and flow through the existing LLM executor.
mcp: servers: - name: linear transport: http url: https://mcp.linear.app/sse auth: type: oauth client_id: ${LINEAR_OAUTH_CLIENT_ID} authorize_url: https://linear.app/oauth/authorize token_url: https://api.linear.app/oauth/token scopes: [read, write] tools: allow: [create_issue, list_issues, update_issue] required: false - name: internal-tools transport: http url: http://internal-mcp.default.svc.cluster.local:8080/mcp auth: type: bearer token_env: INTERNAL_MCP_TOKEN tools: deny: [delete_database, drop_table] # default deny-all if both empty required: true
New CLI subcommands:
forge mcp list # show every configured server, its state, tool count
forge mcp test <name> --call <tool> # connect, list tools, optionally invoke one
forge mcp login <name> # laptop-time OAuth 2.1 PKCE flow
forge mcp logout <name> # remove stored OAuth tokens
Per-server lifecycle: each MCP server runs its own state machine in its own goroutine — parallel startup, Required=true failure cancels the agent context (Kubernetes observes CrashLoopBackOff); Required=false failure leaves the rest of the agent intact with that server's tools absent.
Audit events (NDJSON to stderr, no args/result byte payload): mcp_server_started, mcp_server_failed, mcp_server_degraded, mcp_tool_call, mcp_tool_result, mcp_tool_conflict, mcp_token_refresh.
Phase 1 supports HTTP transport only. Stdio MCP servers are reached via the documented sidecar bridge pattern using projects like mcp-proxy. transport: stdio is rejected at forge validate time with "stdio is on the roadmap; Phase 1 supports HTTP transport only".
MCP protocol version pinned to 2025年06月18日 — the handshake hard-fails on mismatch; version negotiation is intentionally absent.
Migration: the previous mcp_call adapter tool is removed in this release. The new mcp: block is strictly more capable (typed schemas per tool, per-call audit, conflict detection). See docs/mcp/configuration.md for the migration recipe.
4. Embedded skills: code-plan, linear, ticket-driven code-agent and github
- **`...
Assets 9
v0.10.1
2aa422c Forge v0.10.1 — Windows Agent Start Fix, Documentation Link Cleanup
Release Date: May 19, 2026
Full Changelog: v0.10.0...v0.10.1
Forge v0.10.1 is a patch release containing one user-facing reliability fix on Windows and a comprehensive documentation link cleanup. Three pull requests, 24 files changed, no breaking changes — drop-in upgrade from v0.10.0.
Highlights
| # | Area | Change | Issue / PR |
|---|---|---|---|
| 1 | Windows | forge ui no longer falsely reports agent failed to start for healthy daemons. Cause: Signal(0) is unsupported on Windows; the cross-platform process-liveness check is now centralized. |
#59 / #60 |
| 2 | Documentation | All 21 broken doc links in the main README.md are fixed. New CI workflow gates future README link breakage. |
#58 / #61 |
| 3 | Documentation | All 34 broken in-doc cross-references across 14 doc files are fixed. The CI link check is extended to cover docs/**. |
#62 / #63 |
What's Fixed
Windows: forge ui Recognizes Healthy Daemons (PR #60, closes #59)
Symptom: On Windows, starting an agent from forge ui produced a red banner reading agent failed to start: REST: http://localhost:<port>/tasks/send ... — but the backend agent was actually running successfully. The same flow worked correctly on macOS and Linux.
Cause: forge-ui/process.go's pidAlive function used os.Process.Signal(syscall.Signal(0)) — the standard Unix idiom for liveness checking. On Windows, Go's stdlib only translates os.Interrupt and os.Kill; every other signal (including Signal(0)) returns "operating system does not support signal". So pidAlive returned false for every PID on Windows regardless of whether the process was alive.
The waitForPort poll loop interpreted that as "child crashed", fast-failed, and surfaced whatever was already in serve.log as the supposed error — which by that point contained the daemon's normal startup banner.
A second occurrence of the same broken check in forge-ui/discovery.go was actively deleting .forge/serve.json for still-running daemons on every page-load — orphaning them from the dashboard. Both are fixed.
Fix: New shared helper forge-core/util/process/{process_unix.go, process_windows.go} with IsAlive(pid int) bool. Unix path uses FindProcess + Signal(0); Windows path uses OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION) + CloseHandle — the same pattern forge-cli/cmd/serve_windows.go already shipped for daemon-lifecycle checks. The duplicate implementations in forge-cli and forge-ui are deleted in favor of the single shared helper.
// One implementation, build-tag split, consumed by both forge-cli and forge-ui. import "github.com/initializ/forge/forge-core/util/process" if !process.IsAlive(pid) { // accurate on every supported platform }
There is now exactly one implementation of "is this PID alive" in the repo. CI runs GOOS=windows go build to catch future build-tag drift.
README Doc Links Fixed (PR #61, closes #58)
The top-level README.md referenced 25 relative .md files. 21 of them were 404s — the README linked to canonical flat names (docs/runtime.md, docs/security/secrets.md, ...) but the actual docs lived under subdirectories with different filenames (docs/core-concepts/runtime-engine.md, docs/security/secret-management.md, ...). Every newcomer clicking "Quick Start", "Installation", "Architecture", "Skills", "Runtime", "Memory", "Channels", or any of 12 other top-level doc links from the README landed on a GitHub 404.
After the fix:
before: BROKEN 21, OK 4
after: BROKEN 0, OK 25
A new .github/workflows/docs-links.yaml runs on every push/PR touching README.md, docs/**, or the workflow itself — fails the build if any relative .md link doesn't resolve. Prevents this from regressing.
In-Doc Cross-References Fixed (PR #63, closes #62)
The same pattern that broke the README also broke 34 in-doc cross-references across 14 doc files. The heaviest hitter was docs/security/overview.md (10 broken links — the file pre-dated the rename of egress.md → egress-control.md, secrets.md → secret-management.md, signing.md → build-signing.md). Other files used paths like security/guardrails.md from within docs/core-concepts/ which resolved to docs/core-concepts/security/guardrails.md (does not exist) instead of docs/security/guardrails.md.
All 34 fixed via mechanical substitution with #anchor preservation. The CI link check is extended to walk docs/**/*.md in addition to README.md:
before: 34 broken in-doc links across 14 files
after: 0 broken; 37 files checked (1 README + 36 docs)
Upgrade Notes
Drop-in patch upgrade — no forge.yaml changes, no API changes, no behavior changes for healthy installs.
- Windows users on v0.10.0: this release fixes the "agent failed to start" red banner. Highly recommended.
forge-core/util/processis a new package. If you importforge-coredirectly in your own Go projects, the new package is available but doesn't change any existing API.
Install
# Homebrew brew upgrade initializ/tap/forge # One-line install/upgrade curl -sSL https://raw.githubusercontent.com/initializ/forge/main/install.sh | bash # Docker docker pull ghcr.io/initializ/forge:v0.10.1 docker pull ghcr.io/initializ/forge:latest
Reference
Assets 9
v0.10.0
16c2279 Forge v0.10.0 — Dynamic Secret Overlay, Configurable Security Scoring, K8s Channel Manifests, Slack Bot Admission
Release Date: May 18, 2026
Full Changelog: v0.9.2...v0.10.0
Forge v0.10.0 makes encrypted secrets, security policy, channel deployment, and inter-bot communication first-class operator concerns. Skill-declared env vars now flow from the encrypted store to the agent process without code changes; security audits can be downgraded via YAML policy with full auditability; Kubernetes manifests finally include channel env vars from the same source as docker-compose; and Slack agents can now accept @-mentions from operator-allowlisted bots while staying immune to self-loops. Two latent bugs in the secret provider chain are also fixed. 62 files changed, 2,718 insertions, 728 deletions across forge-cli, forge-core, forge-plugins, and forge-skills.
Highlights
| # | Area | Change | Issue / PR |
|---|---|---|---|
| 1 | Secrets | Skill-declared env vars in encrypted store reach the agent's OS env via dynamic provider.List() discovery — no hardcoded key list |
#48 / #51 |
| 2 | Skills audit | New forge skills audit --policy <path> overrides scoring and policy checks; downgrades carry (via policy) / (acknowledged by policy) annotations |
#49 / #53 |
| 3 | K8s + channels | Channel env vars from <channel>-config.yaml now flow into k8s/secrets.yaml, k8s/deployment.yaml, and docker-compose.yaml from a single canonical source |
#50 / #54 |
| 4 | Secrets | Stale global ~/.forge/secrets.enc with a different passphrase no longer poisons the chain — eager-load validation drops it with a clear warning |
#52 / #57 |
| 5 | Slack | New allow_bot_ids setting admits @-mentions from operator-trusted bots; the agent's own bot_id is unconditionally dropped (self-loop guard) |
#55 / #56 |
What's New
Dynamic Secret Overlay for Skill-Declared Env Vars (PR #51, closes #48)
Before: the runtime overlaid only a hardcoded set of LLM/channel keys (OPENAI_API_KEY, SLACK_BOT_TOKEN, etc.) from the encrypted secrets store. A skill declaring metadata.forge.requires.env.required: [ACME_API_TOKEN] would see an empty value at runtime, even after forge secret set ACME_API_TOKEN ....
Now: a new secretOverlayKeys(provider) helper unions the builtin set with every key the provider exposes via Provider.List(). Custom skill-declared keys flow through the encrypted store to the OS env automatically — no code change required when adding a new skill.
# skills/my-skill/SKILL.md metadata: forge: requires: env: required: [ACME_API_TOKEN]
forge secret --local set ACME_API_TOKEN my-secret-value forge run # ACME_API_TOKEN is now in the agent's environment
Cross-category secret-reuse detection continues to operate on the builtin set only — custom keys have no defined purpose category and are not part of the reuse check.
Policy-Configurable Security Scoring (PR #53, closes #49)
forge skills audit now accepts a --policy <path> flag that loads a YAML SecurityPolicy. The policy affects both policy-violation checks (existing) and scoring (new). Three new scoring overrides let operators down-weight builtin classifications:
# policy.yaml script_policy: allow max_risk_score: 90 trusted_domains: [internal.example.com] # +2 instead of +10 (unknown) acknowledged_bins: [python] # +3 instead of +15 (builtin high-risk) acknowledged_env: [DB_PASSWORD] # +5 instead of +10 (builtin sensitive)
forge skills audit --dir skills --policy policy.yaml
Every override is auditable. Affected factors carry (via policy) or (acknowledged by policy) in their description, visible in both --format text and --format json:
egress +2 trusted domain (via policy): internal.example.com
binary +3 high-risk binary (acknowledged by policy): python
env +5 sensitive variable (acknowledged by policy): DB_PASSWORD
Scoring overrides can only down-weight builtin classifications — a binary not in the high-risk set stays at +3 even if listed in acknowledged_bins. This prevents accidental score inflation.
API change: AnalyzeSkillDescriptor and AnalyzeSkillEntry now take a SecurityPolicy parameter. Pass SecurityPolicy{} to reproduce historical default scoring exactly.
Channel Env Vars in Kubernetes Manifests (PR #54, closes #50)
Before this release, forge build generated k8s/deployment.yaml and k8s/secrets.yaml containing only skill-declared env vars. Channel env vars (SLACK_BOT_TOKEN, TELEGRAM_BOT_TOKEN, etc.) were silently dropped — even though forge package --with-channels injected them into docker-compose via a hardcoded switch statement. The two output paths disagreed from the same forge.yaml.
A new ChannelsStage (forge-cli/build/channels_stage.go) reads each <channel>-config.yaml and unions every _env-suffixed setting value into Spec.Requirements.EnvRequired. EnvVarsFromConfig in forge-cli/channels/env.go is the canonical helper; generateDockerCompose in cmd/package.go calls the same function, so both output paths produce a consistent env-var set.
# slack-config.yaml — the canonical source adapter: slack settings: app_token_env: SLACK_APP_TOKEN bot_token_env: SLACK_BOT_TOKEN custom_env: MY_PROJECT_SLACK_OVERRIDE # ← appears in K8s + docker-compose
After forge build:
# .forge-output/k8s/secrets.yaml apiVersion: v1 kind: Secret metadata: name: my-agent-secrets stringData: SLACK_APP_TOKEN: "" SLACK_BOT_TOKEN: "" MY_PROJECT_SLACK_OVERRIDE: "" TELEGRAM_BOT_TOKEN: ""
Adding a new channel adapter requires zero build-pipeline edits. Operators (and channel-adapter authors) just declare env vars via the _env suffix convention in their channel YAML.
Behavior change worth calling out: SLACK_SIGNING_SECRET is no longer injected. The Slack adapter is Socket Mode and never reads it; the previous hardcoded map included it erroneously. Operators using a custom Slack adapter that does need it can add signing_secret_env: SLACK_SIGNING_SECRET to their slack-config.yaml.
Slack Bot @-Mention Admission with Self-Loop Guard (PR #56, closes #55)
The Slack adapter previously dropped every event whose bot_id was non-empty — silently ignoring @-mentions from scheduler bots, monitoring tools, workflow steps, and CI bots even when an operator explicitly wired them up.
A new optional allow_bot_ids setting admits specific bots:
# slack-config.yaml adapter: slack settings: app_token_env: SLACK_APP_TOKEN bot_token_env: SLACK_BOT_TOKEN allow_bot_ids: B0123ABC,B0456DEF # comma-separated bot_ids
Three coordinated rules keep loops bounded:
| Rule | Behavior |
|---|---|
| Self-loop guard (always on) | The agent's own bot_id is dropped before the allowlist is consulted. No opt-out — even if its own ID is misconfigured into allow_bot_ids, the guard short-circuits. |
| Allowlist gate (opt-in) | Bots not in allow_bot_ids stay dropped. Empty/absent setting preserves today's behavior (no other bots admitted). |
| Mention requirement (unchanged) | Admitted bots still must include <@FORGE_AGENT> in the message text. Chatter without a tag is ignored. |
resolveBotID now captures both user_id and bot_id from Slack's auth.test response — user_id drives mention matching; bot_id powers the self-loop guard. Drop reasons are emitted to stderr with the bot_id and a pointer to slack-config.yaml allow_bot_ids so operators can self-diagnose.
Stale Global Secrets File No Longer Poisons the Chain (PR #57, closes #52)
A latent bug in OverlaySecretsToEnv and Runner.buildSecretProvider — exposed during the validation of #51 — caused a stale ~/.forge/secrets.enc from an unrelated project (encrypted with a different passphrase) to break ChainProvider.List() for everyone. Local file's keys would silently disappear from the agent's env.
A new viableEncryptedFileProviders helper eagerly validates each candidate file at chain-build time:
| Candidate state | Behavior |
|---|---|
File missing (os.IsNotExist) |
Silently skipped — common case, no log noise |
| Decryption succeeds | Admitted; cleartext is cached for subsequent reads (no extra Argon2id) |
| Decryption fails (wrong passphrase, corruption) | Dropped from the chain with a clear warning naming the file path |
forge: skipping secrets provider that failed to load
(label=global, path=/Users/.../.forge/secrets.enc,
error=decrypting secrets file: ... wrong passphrase?)
ChainProvider semantics stay strict — non-NotFound errors from Get / List still propagate. The fix is at the chain-construction site: we don't admit providers we already know will fail.
Documentation
Each PR shipped with synced docs. Notable additions:
- docs/security/secret-management.md — new
Skill-Declared SecretsandProvider Chain Validationsections. - docs/skills/skills-cli.md — new
Security Auditsection with the policy YAML schema and the(via policy)annotation contract. - **[docs/deployment/kubernetes.md](https://github.com/initializ/forge/blob/v0.10.0/do...
Assets 9
v0.9.2
147a072 Forge v0.9.2 — Skill Environment Injection, Docker Images, Agent Reliability Fixes
Release Date: April 17, 2026
Full Changelog: v0.9.1...v0.9.2
Highlights
Forge v0.9.2 delivers automatic environment variable and egress domain injection when saving skills, pre-built multi-architecture Docker images on GHCR, and critical agent reliability fixes for port allocation, stale daemon detection, and OAuth credential handling. This release modifies 25 files with over 1,200 lines of new code across all five modules.
What's New
Skill Environment & Egress Auto-Configuration (#46)
When saving a skill via the UI Skill Builder or forge skills add, Forge now automatically configures the agent's environment — no manual .env or forge.yaml editing required.
How it works:
- SKILL.md frontmatter declares
metadata.forge.requires.envandmetadata.forge.egress_domains - On save, Forge parses requirements and:
- Writes env vars to
.env(deduplicated, skips existing keys and encrypted placeholders) - Merges egress domains into
forge.yamlsecurity.egress.allowed_domains(deduplicated, sorted, YAML formatting preserved) - Reports missing env vars back to the UI for user input
- Writes env vars to
- Frontend displays what was configured and prompts for any remaining required variables
# CLI: forge skills add now also merges egress domains forge skills add github # ✓ Added egress domains: api.github.com, github.com # ✓ Wrote GITHUB_TOKEN to .env
New shared utilities in forge-cli/cmd/skill_env.go:
ParseSkillRequirements()— extracts env requirements and egress domains from SKILL.mdMergeEgressDomains()— adds domains to forge.yaml allowlist with text-based YAML insertionAppendEnvVars()— appends key=value pairs to.envwith deduplicationCheckMissingEnv()— checks OS env +.env+ secret placeholders for missing entries
SkillSaveFunc now returns *SkillSaveResult with structured response fields: path, egress_added, env_configured, env_missing.
Pre-Built Docker Images on GHCR (#46)
Forge now publishes multi-architecture Docker images (linux/amd64, linux/arm64) to GitHub Container Registry on every release.
# Pull the latest release docker pull ghcr.io/initializ/forge:latest # Pin to a specific version docker pull ghcr.io/initializ/forge:v0.9.2 # Run with your agent directory mounted docker run -v /path/to/agent:/home/forge/agent -w /home/forge/agent \ -e OPENAI_API_KEY=sk-... \ ghcr.io/initializ/forge:latest run --host 0.0.0.0
Image details:
- Tags:
v0.9.2,v0.9,v0,latest(semver hierarchy) - Base:
alpine:3.22.4withca-certificates,git,tzdata - Build: Multi-stage with
golang:1.25-alpine, BuildKit cache mounts,CGO_ENABLED=0static binary - Security: Non-root
forgeuser, minimal attack surface - CI: QEMU for cross-platform emulation, GitHub Actions cache for layer reuse
LLM Agent Loop Robustness (#46)
Two fixes to the core agent loop that prevent subtle failures across providers:
Tool call execution fix: The loop now terminates based solely on len(ToolCalls) == 0, ignoring FinishReason. Some providers (notably certain OpenAI models) return FinishReason: "stop" even when tool calls are present — previously this caused the agent to stop mid-execution.
Session recovery deduplication: When a session is recovered from disk after a crash or timeout, the executor checks whether the conversation already ends with an identical user message and skips the duplicate. This prevents the same prompt from appearing twice in the context window on retry.
Q&A nudge suppression: When only explore-phase tools were invoked (e.g., web_search, file_read), the agent no longer sends a continuation nudge ("You stopped..."), correctly treating the response as a final answer to an informational query.
Bug Fixes
Port Collision on Agent Start (forge-ui)
Before: When the UI restarted, PortAllocator lost its in-memory state and always tried port 9100 first — colliding with agents already running on that port from a previous session.
After: Allocate() now does a net.Listen TCP probe to verify the port is actually free before assigning it. Occupied ports are automatically skipped.
Stale Agent Status After UI Restart (forge-ui)
Before: detectExternalAgent() only checked if a port was listening via TCP probe. If serve.json was stale (agent crashed) but another process happened to be on the same port, the dead agent appeared as "running".
After: detectExternalAgent() now checks PID liveness via pidAlive() (signal 0) before the TCP probe. If the PID from serve.json is dead, the stale file is cleaned up and the agent is correctly reported as stopped.
OAuth Silent 401 in Skill Builder and Runner (forge-cli)
Before: When OpenAI OAuth credentials were missing or unloadable (no ~/.forge/credentials/openai.json), both the Skill Builder llmStreamFunc and the Runner createProviderClient silently fell through to create a regular OpenAI client with no API key — resulting in a cryptic 401 Unauthorized error.
After: When the API key is empty/__oauth__ and oauth.LoadCredentials() fails, a clear error is returned: "no OpenAI API key or OAuth credentials found; run 'forge init' with OAuth or set OPENAI_API_KEY".
Files Changed
| Area | Files | Description |
|---|---|---|
| Skill env utilities | forge-cli/cmd/skill_env.go |
ParseSkillRequirements, MergeEgressDomains, AppendEnvVars, CheckMissingEnv |
| Skill env tests | forge-cli/cmd/skill_env_test.go |
12 unit tests for shared utilities |
| CLI skills add | forge-cli/cmd/skills.go |
Egress domain merge in runSkillsAdd |
| UI save func | forge-cli/cmd/ui.go |
Enhanced skill save with env/egress handling, OAuth error surfacing |
| Runner OAuth | forge-cli/runtime/runner.go |
OAuth error surfacing in createProviderClient |
| Parser export | forge-skills/parser/parser.go |
Export ExtractForgeReqs for reuse |
| LLM loop | forge-core/runtime/loop.go |
Ignore FinishReason, session dedup, Q&A nudge suppression |
| Loop tests | forge-core/runtime/loop_test.go |
Regression tests for stop+tool_calls |
| UI types | forge-ui/types.go |
SkillSaveResult, SkillEnvEntry, updated SkillSaveFunc signature |
| UI handler | forge-ui/handlers_skill_builder.go |
Pass env vars, return SkillSaveResult |
| Port allocator | forge-ui/process.go |
net.Listen probe in Allocate(), portFree() helper |
| Agent discovery | forge-ui/discovery.go |
PID liveness check, stale serve.json cleanup |
| Frontend | forge-ui/static/dist/app.js |
Env var inputs, egress/env status display |
| Docker | Dockerfile, .dockerignore |
Multi-stage alpine build |
| CI | .github/workflows/release.yaml |
GHCR multi-arch build+push job |
| Docs | docs/{runtime,skills,dashboard,deployment,commands}.md |
Synced with code changes |
25 files changed, 1,201 insertions, 79 deletions
Installation
# Homebrew brew upgrade initializ/tap/forge # One-line install/upgrade curl -sSL https://raw.githubusercontent.com/initializ/forge/main/install.sh | bash # Docker docker pull ghcr.io/initializ/forge:v0.9.2
Contributors
Built with Forge — turn SKILL.md into portable, secure, runnable AI agents.
Assets 9
v0.9.1
6a514b6 Forge v0.9.1 — Container Packaging, Local Binary Overrides, External Auth, and Guardrails Library
Release Date: April 13, 2026
Full Changelog: v0.9.0...v0.9.1
Highlights
Forge v0.9.1 delivers container packaging improvements, local binary injection, external authentication, inline KUBECONFIG support, and a full guardrails library integration. This release modifies 65 files with over 3,600 lines of new code, making container-based agent deployments significantly more flexible and production-ready.
What's New
Container Packaging: Local Binary Overrides (#44)
Inject host-compiled binaries directly into container images, bypassing remote resolution entirely. This enables air-gapped builds, custom binary versions, and rapid iteration without waiting for upstream releases.
# Inject a locally-built forge binary into the container forge package --local-bin forge=/path/to/linux/forge # Combine with image optimization forge package --local-bin forge=./bin/forge --alpine --slim
--local-bin name=/path/to/fileflag on bothforge buildandforge package(repeatable)--slimflag to minimize image size by skipping heavy/optional binaries--alpineflag to prefer Alpine base image- Configurable in
forge.yamlviapackage.bin_overrides.<name>.local - Smart Dockerfile generation with multi-stage bin resolution: local override → skill override → config override → image registry → apt/apk
- Automatic
ca-certificatesinstallation for TLS support
External Authentication Provider (#44)
Delegate token validation to an external auth endpoint for enterprise deployments. The internal loopback token ensures channel adapters (Slack, Telegram) continue to work without needing a valid external token.
# Via CLI flag forge run --auth-url https://auth.example.com/verify # Via environment variable (containers) docker run -e FORGE_AUTH_URL=https://auth.example.com/verify my-agent
- Two-layer auth middleware: internal token accepted first (channel loopback), then external provider
FORGE_AUTH_URLandFORGE_AUTH_ORG_IDenvironment variables- Org ID extracted from
X-Org-ID,org-id, ororg_idrequest headers
KUBECONFIG Materialization (#44)
Pass kubeconfig content directly as an environment variable — the runtime automatically writes it to a file and updates KUBECONFIG to the file path. No volume mounts needed.
docker run -e KUBECONFIG="$(cat ~/.kube/config)" my-agent- Detects inline YAML via newlines,
apiVersion:markers, andcertificate-authority-data:presence kubectlandhelmreceive the materialized path via env passthrough
Channel Adapter Fixes (#44)
- Entrypoint now includes
--withflag — container entrypoint automatically becomes["forge", "run", "--host", "0.0.0.0", "--with", "slack,telegram"]when channels are configured inforge.yaml - Channel config files packaged —
slack-config.yaml,telegram-config.yaml, etc. are automatically copied into the Docker build context - Auth loopback — channel adapters authenticate to the A2A server using an internal token, bypassing external auth providers
Guardrails Library Integration (#45)
Replaced the hand-rolled GuardrailEngine (435 lines of hardcoded patterns) with the external github.com/initializ/guardrails library, supporting dual-mode operation.
- File-based mode (
guardrails.json) for local development — zero external dependencies - MongoDB-backed mode for platform deployments with centralized policy management and audit logging
- Inbound PII masking fixed —
CheckInboundnow correctly masks PII (e.g., SSNs) before they reach the LLM - Session recovery crash fixed — orphaned tool calls (assistant
tool_callswithout matching tool results) are stripped on save and recovery, preventing API rejection errors
Skills Build Stage Fix (#45)
SkillsStagenow always scans theskills/subdirectory even without a rootSKILL.md, restoring binary installation (e.g.,kubectl) for subdirectory-only skill projects
Root SKILL.md Removed (#44)
forge initno longer generates a root-levelSKILL.mdwith placeholder content- Skills live exclusively in
skills/<skill-name>/SKILL.mdsubdirectories - Removed
skills.pathfromforge.yamltemplate - Build pipeline already scans
skills/automatically
OpenAI API Fix (#44)
- Fixed
"Invalid value for 'content': expected a string, got null"error by omittingcontentfield for assistant messages withtool_callsper OpenAI spec
New Configuration
forge.yaml — Package Section
package: alpine: false # Prefer Alpine base image slim: false # Minimize image size bin_overrides: forge: local: "/path/to/linux/forge" # Host path to local binary jq: apt: "jq" # APT package name custom-tool: url: "https://example.com/tool.tar.gz" dest: "/usr/local/bin/custom-tool" chmod: "0755"
New Environment Variables
| Variable | Description |
|---|---|
FORGE_AUTH_URL |
External auth provider URL for token validation |
FORGE_AUTH_ORG_ID |
Organization ID sent to external auth provider |
FORGE_GUARDRAILS_DB |
MongoDB connection string for centralized guardrails |
New CLI Flags
| Command | Flag | Description |
|---|---|---|
forge build |
--local-bin |
Local binary override as name=/path/to/file (repeatable) |
forge build |
--slim |
Minimize image size |
forge build |
--alpine |
Prefer Alpine base image |
forge package |
--local-bin |
Local binary override (same as build) |
forge package |
--slim |
Minimize image size |
forge package |
--alpine |
Prefer Alpine base image |
forge run |
--auth-url |
External auth provider URL |
forge serve |
--auth-url |
External auth provider URL |
Breaking Changes
- Root
SKILL.mdno longer generated byforge init. Existing projects with a rootSKILL.mdare unaffected — the build pipeline still reads it if present, but new projects should useskills/subdirectories exclusively. skills.pathremoved fromforge.yamltemplate. Existing configs withskills.pathstill work at runtime.
Pull Requests
- #44 — feat: container packaging, local binary overrides, and external auth
- #45 — feat: integrate guardrails library with dual-mode support
Stats
- 65 files changed, 3,652 insertions, 1,031 deletions
- 5 new CLI flags across
build,package,run, andserve - 3 new environment variables for auth and guardrails
- 1 new
forge.yamlsection (package)
Installation:
# Homebrew brew upgrade initializ/tap/forge # Binary (Linux/macOS) curl -fsSL https://github.com/initializ/forge/releases/download/v0.9.1/forge-$(uname -s)-$(uname -m).tar.gz | tar xz -C /usr/local/bin forge
Documentation: docs/
Assets 9
v0.9.0
b537a1a Forge v0.9.0 — Security Hardening, GitHub Skills, and Bug Fixes
Release Date: April 4, 2026
Full Changelog: v0.8.0...v0.9.0
Highlights
Forge v0.9.0 is a security-focused release that delivers two full phases of security hardening (17 fixes total), a new GitHub API skill, a critical secret decryption bug fix, and improvements across the CLI, TUI, and channel plugins. This release modifies 78 files with over 4,100 lines of new code and hardened tests.
What's New
Security: Phase 1 — Critical Fixes (C-1 through C-7)
- SSRF protection — new IP validator blocks requests to private/loopback/link-local ranges (#34)
- Safe dialer — all outbound HTTP connections routed through a secure dialer with DNS rebinding protection
- Redirect validation — HTTP redirects are checked against the egress allowlist before following
bash_executeremoved — eliminated the high-risk shell execution tool from the code-agent skill (#29)- Egress enforcer hardened — stricter domain matching and proxy enforcement
Security: Phase 2 — High-Priority Fixes (H-1 through H-10)
- Scoped environment variables —
KUBECONFIG,NO_PROXY, andGH_CONFIG_DIRare now injected only into their target binaries (kubectl, helm, gh), not the global environment (#39, #42) - A2A server hardened — added input validation, rate limiting, and auth improvements to the Agent-to-Agent server
- Custom tool sandboxing — external tool execution now enforces stricter argument validation
- Channel plugin hardening — Slack and Telegram adapters received input sanitization and error-handling improvements
- Guardrails loader hardened — runtime guardrail loading now validates schema before application
New Feature: GitHub API Skill
- Query GitHub users, pull requests, forks, and stargazers directly from within an agent (#38)
- Includes six new scripts:
github-get-user,github-list-prs,github-list-forks,github-list-stargazers,github-pr-author-profiles,github-stargazer-profiles - Per-tool PII exemptions — tools that need GitHub usernames can bypass PII redaction on a per-tool basis
Bug Fixes
- Secret decryption — fixed a bug where decryption failed even with the correct passphrase (#40, #41)
- Q&A nudge suppression — resolved unwanted nudge prompts during agent conversations
- UI agent start errors — fixed errors when starting agents from the skill builder UI
- Chat streaming — resolved streaming interruption issues in the TUI
- File attachment —
cli_executenow correctly handles file attachment behavior - Errcheck lint — fixed unchecked error returns in test files
Documentation
- Updated security docs covering egress enforcement, guardrails, and the new IP validator
- Synced architecture, channels, runtime, skills, and tools documentation with code changes (#43)
Breaking Changes
bash_executetool removed — agents using thebash_executebuiltin tool must migrate tocli_executeor custom tool definitions. This tool was removed for security reasons.
Upgrade Guide
# Update via Homebrew brew upgrade initializ/tap/forge # Or pull the latest binary curl -sSL https://raw.githubusercontent.com/initializ/forge/main/install.sh | bash
No configuration changes required. Existing agents and skills are fully compatible with v0.9.0.
Stats
| Metric | Value |
|---|---|
| Files changed | 78 |
| Insertions | +4,126 |
| Deletions | −632 |
| Net new lines | +3,494 |
| PRs merged | 6 |
| Contributors | 2 |
Pull Requests Included
- #43 — docs: sync documentation for Phase 2 fixes and UI improvements (@initializ-mk)
- #42 — security: Phase 2 high-priority fixes and UI improvements (@initializ-mk)
- #41 — [Bug]: Secret decryption fails with correct passphrase #40 (@pandey03muskan)
- #39 — security: Phase 2 high-priority fixes (H-1 through H-10) (@initializ-mk)
- #38 — feat: add GitHub API query tools and per-tool PII exemptions (@initializ-mk)
- #34 — security: Phase 1 critical fixes (C-1 through C-7) (@initializ-mk)
Contributors
Forge is a secure, portable AI agent runtime. Build, run, and deploy AI agents from a single SKILL.md file.
Learn more at github.com/initializ/forge • Documentation