-
Notifications
You must be signed in to change notification settings - Fork 30
Releases: SponsioLabs/Sponsio
v0.2.0a3: redirect-to-safe fails closed (alpha)
Sponsio 0.2.0a3: redirect-to-safe fails closed
Released: 2026年06月08日 · Status: alpha ·
pip install --pre sponsio==0.2.0a3If you are on
0.2.0a2and useredirect_to_safewith any adapter other than LangGraph, upgrade. A review of the v0.2.0a2 release surfaced a fail-open bug: the guard correctly rolled the unsafe call out of the trace, but six adapters then ran the original unsafe tool anyway. This release closes that hole.
The 0.2.0a3 release is one safety-relevant fix on top of the 0.2.0a2 "runtime-value comparisons" work, plus three smaller fixes from the same review pass. Nothing behavior-breaking; the change is "the unsafe call now actually does not execute" everywhere.
What's fixed
1. redirect_to_safe now fails closed everywhere
What happened. When a contract using redirect_to_safe(unsafe, safe) fired, the guard returned action="redirected" with blocked=False. LangGraph's adapter checked for .redirected first and substituted the safe tool transparently, as designed. But every OTHER adapter (CrewAI, OpenAI Agents SDK, Vercel AI, Claude Agent SDK, Google ADK, MCP, and the base custom-loop helper) only checked if check.blocked before executing the call — and blocked is False on a redirect. So:
- The guard rolled the
unsafeevent back from the trace (correct). - The adapter, seeing
blocked=False, fell through torun_tool(unsafe, args)and executed the original unsafe call anyway (wrong).
This is a fail-OPEN, the worst outcome for an enforcement layer. The contract fired; the safety control silently degraded to a no-op.
The fix. A new CheckResult.stop_original property folds blocked and redirected together. Every non-substituting adapter now gates execution on stop_original, so a redirect refuses the unsafe call:
# Old (fail-OPEN on redirect) if check.blocked: return refusal_message(check) return run_tool(name, args) # New (fail-CLOSED on redirect) if check.stop_original: # blocked OR redirected return refusal_message(check) return run_tool(name, args)
LangGraph is unchanged: it branches on .redirected first and performs the substitution, so it never reaches the stop_original gate. Existing tests against blocked / redirected / allowed still hold.
Adapters with stop_original gating: base.py (covers most), crewai.py, agents.py, claude_agent.py, google_adk.py, vercel_ai.py, mcp.py.
Tracked follow-up: the Cursor adapter takes a separate evaluate_event outcome path and is not wired through stop_original yet.
2. TS Eq matches Python value equality for composite types
Eq(ArgValue("tool", "field"), CtxValue("expected")) is a v0.2 surface (the Term abstraction made composite-value equality reachable). The TS evaluator used ===, which is reference equality for arrays and objects:
// Python: True. TS: False (different refs). [1, 2] == [1, 2]
So a contract that the Python guard let through could fire on TS for the same trace. New valuesEqual does element- and key-wise deep comparison, restoring parity. Regression test at ts/packages/sdk/src/__tests__/parity.test.ts.
3. TS SDK no longer crashes Cloudflare Workers at import
The YAML loader called createRequire(import.meta.url) eagerly at module top level. On Cloudflare Workers import.meta.url is undefined, and createRequire(undefined) threw, taking the whole bundle down even when YAML was never loaded.
Now the require instance is built lazily on first YAML load, with a ?? "file:///sponsio-noop.js" fallback. Workers that never touch YAML never call createRequire. (The sponsio-demo repo had patched this via patch-package; with 0.2.0a3 you can drop the patch.)
4. Pytest setup errors cleared up
A pre-existing autouse fixture in tests/conftest.py (the rich-style cache reset) called isinstance(obj, Style) on every live object. Optional SDK lazy proxies raised from their __class__ getter (OpenAI's voice helpers try to pull sounddevice), erroring 1684 of 2312 test setups. The check now swallows introspection failures.
Documentation repairs
filter_toolsdocuments itsO(candidates ×ばつ trace_length)re-grounding cost.workflow_stepdocuments the end-of-trace weak-next vacuity caveat (matters only for batch verify / replay; live enforce mode self-corrects on the next event).Var.__eq__documents that it builds AST nodes, not booleans._warned_missing_varsandarg_valueretention get explicit footgun notes.- Several docstring first-lines repaired (artifacts of the earlier em-dash sweep that surfaced in
help()and IDE hover popups).
Upgrading
pip install --pre sponsio==0.2.0a3
No CLI, config, or runtime API changes. CheckResult.stop_original is a new derived property; everything you wrote against blocked / redirected / allowed keeps working.
If you have a custom adapter that calls guard.guard_before(...) and gates on if check.blocked, switch to if check.stop_original to pick up the fail-closed behavior for redirects.
Compatibility
- No breaking API changes.
- TS users on Cloudflare Workers can remove any
patch-packageworkaround for thecreateRequirecrash. - TS
Eqsemantics change from reference- to value-equality for arrays and objects. The Python-side semantics are unchanged; the TS side now matches Python. Contracts that relied on TS reference-equality (i.e. the bug) will see different verdicts.
Credits
Thanks to @donalddellapietra for the review pass that surfaced the fail-open bug, the TS Eq parity gap, and the Worker runtime crash. Full PR: #78.
What's next
- Wire
stop_originalthrough the Cursor adapter'sevaluate_eventpath. - TS NL parser port for
workflow_stepand the Term comparison forms (still factories-only on TS). - TS DFA-compiled evaluator port.
If you are using 0.2.0a3 and hit something we did not predict, open an issue.
Assets 2
v0.2.0a2: runtime-value comparisons + benchmark libraries (alpha)
Sponsio 0.2.0a2: runtime-value comparisons + benchmark libraries
Released: 2026年06月07日 · Status: alpha ·
pip install --pre sponsio==0.2.0a2
The 0.2.0a1 "softer landings" release made contracts more graceful when they fire. 0.2.0a2 makes them more expressive: contracts can now read runtime values out of tool arguments and context facts, compare them against each other, and prescribe the next action instead of only forbidding it.
It also ships the five hand-curated benchmark contract libraries that produce Sponsio's published RedCode-Exec, ODCV-Bench, τ2-bench, AgentDojo, and SWE-bench headline numbers, plus brings the TypeScript SDK to parity on the new deterministic core.
What's new
1. Term abstraction: compare runtime values
What it is. The arithmetic comparison family (Eq, Le, Lt, Ge, Gt) now accepts any Term, not just Var or Const. Four runtime-bound term subclasses ship with this release:
ArgValue(tool, field): raw value ofargs[field]when the current event is a call totool.CtxValue(key): raw value of an externally pushed context fact (guard.observe_context).ArgLength(tool, field):len(args[field])shorthand.UnaryFn(fn, term): apply a Python callable to another term's value.
from sponsio.formulas.formula import ArgValue, CtxValue, Eq, G, Implies, Atom # "If we issue a refund, the amount must equal what the supervisor approved." contract("refund matches approval").guarantees( G(Implies( Atom("called", "issue_refund"), Eq(ArgValue("issue_refund", "amount"), CtxValue("approved_amount")), )) )
Why it exists. Until 0.2.0a2 the only way to compare a runtime arg against an out-of-band fact was to push the comparison up into Python and use a custom strategy callback. The Term abstraction lets the comparison live inside the contract, so it shows up in sponsio validate, in audit logs, and in the DFA-compiled fast path.
Why it's good for users.
- Audit-friendly. The constraint is declarative, not buried in callback code. A security reviewer reads the contract and sees what's being compared.
- Cheap. Polymorphic dispatch is microseconds; no per-event Python callback overhead.
- Composable.
UnaryFn(len, ArgValue(...))andArgLength(...)cover length caps;UnaryFn(str.lower, ...)covers case-insensitive matches; arbitrary callables cover the rest. - Safe on missing data. Either operand resolving to
Noneevaluates the comparison to false (the comparison cannot decide) rather than raising. Wrap fragile comparisons inImplies(scope_predicate, comparison)to suppress them where the relevant arg is not applicable.
2. workflow_step(trigger, next_action): prescriptive next-step
What it is. A new pattern that says "when trigger holds at the current event, the next event must satisfy next_action". Compiles to G(trigger -> X(next_action)).
from sponsio.patterns import workflow_step from sponsio.formulas.formula import Atom contract("toggle roaming on disabled status").guarantees( workflow_step( Atom("ctx", "roaming_status", "disabled"), Atom("called", "toggle_roaming"), ) )
Why it exists. Sponsio's existing patterns are all block-style: "you must not do X", "X requires Y first". workflow_step is the prescriptive counterpart: "you must do X next". Workflow-style policies ("if you observe X, the next step is Y") map directly onto the pattern without bending the contract into an awkward never-followed-by.
Why it's good for users.
- Both arguments are arbitrary atoms.
called(...),ctx(k, v),arg_field_has(...)all work in either position, so the same factory covers tool ordering, ctx-driven remediation, and arg-conditional follow-ups. - One-step bounded. Unlike the F-style
always_followed_by,workflow_stepdecides after a single event. No liveness obligation hanging at session end.
3. Five benchmark contract libraries
What they are. Hand-curated YAML libraries that reproduce Sponsio's published benchmark headline numbers:
| Library | Benchmark | Contracts |
|---|---|---|
sponsio:benchmark/redcode_exec |
RedCode-Exec dangerous-snippet detection | 26 |
sponsio:benchmark/odcv_bench |
ODCV-Bench KPI-pressure protection | 19 + per-scenario LLM-scan cache |
sponsio:benchmark/tau2_bench |
τ2-bench procedural-correctness | 120 materialised contracts |
sponsio:benchmark/agentdojo |
AgentDojo prompt-injection / lethal-trifecta defence | 31 |
sponsio:benchmark/swebench |
SWE-bench Verified procedural-correctness | ~20 per instance |
Load like a capability pack:
agents: my_bot: include: - sponsio:benchmark/redcode_exec - sponsio:benchmark/odcv_bench
Why they exist. The numbers in the benchmark documents (95.6% on ODCV-Bench, 92% combined on RedCode, 0.746 AUC on τ2-bench) are reproducible only if the exact contracts are available. The libraries are the documentation-of-record for those results.
Why they're good for users.
- Reproducibility. The published numbers stop being "trust us" and become "run this script on this YAML".
- Forks-as-starting-points. Most rules tagged
code-executionorcode-qualitygeneralise; a handful are calibrated to dataset-specific markers. The library is meant to be forked, edited, and pruned, not used verbatim in production. - Cross-runtime. The YAML loads identically on the Python guard and on the TypeScript SDK. Both runtimes ship the same five files.
4. TypeScript SDK reaches parity on the deterministic core
The TS SDK (@sponsio/sdk) now mirrors:
- The
Termabstraction and all four runtime-bound term classes (ArgValue,CtxValue,UnaryFn,ArgLength). - The
workflowStep(trigger, nextAction, desc?)pattern factory. - The five benchmark contract YAML libraries under
ts/packages/sdk/contracts/benchmark/. - Grounding emits
arg_value(tool, field)andctx_value(key)on every event. - The textual
(formula, trace) -> verdictround-trip parser accepts the three new term tokens.
Verdicts agree on both runtimes for any contract built from primitives that exist in both. Same (formula, trace) pair always produces the same outcome.
Upgrading
pip install --pre sponsio==0.2.0a2
No breaking changes vs 0.2.0a1. Existing contracts continue to compile and behave identically. The new primitives are additive.
Compatibility
VarandConstare nowTermsubclasses. TheArithExprtype is an alias forTerm, so existing type hints keep working.Valuation(TS) is nowRecord<string, unknown>. If your TypeScript code stored boolean / number atoms with an explicitRecord<string, boolean | number>typing, narrow at the call site or upcast as needed.- No CLI or config schema changes.
sponsio validate,sponsio onboard,sponsio.yamlall unchanged.
Known limitations
- TS's
parseNl()does not yet recogniseworkflow_stepor theTermcomparison forms as natural-language strings. The factories ARE available for direct construction; only the NL parser is behind. Seedocs/reference/ts-sdk-parity.md. - TypeScript SDK still does not ship a DFA-compiled evaluator (only the recursive one). Verdicts agree, but the DFA path is faster on long traces. This stays on the roadmap.
What's next
- TS NL parser port for
workflow_stepand theTermforms. - TS DFA-compiled evaluator port.
- Continue closing the v0.2 strategy system gap on TS (
RedirectToSafedispatch in@sponsio/sdk/langchain,EscalateToHuman.notifycallback hooks).
If you are using 0.2.0a2 and hit something we did not predict, open an issue.
Assets 2
v0.2.0a1: softer landings (alpha)
Sponsio 0.2.0a1: softer landings
Released: 2026年06月06日 · Status: alpha ·
pip install --pre sponsio==0.2.0a1Note on the version. The "softer landings" work was developed against
0.2.0a0; the alpha that actually shipped to PyPI is0.2.0a1. The bump exists because the0.2.0a0upload to TestPyPI had relative image paths in its README that PyPI's renderer does not resolve, and PyPI does not allow re-uploading a version even after deletion. No runtime changes between0.2.0a0and0.2.0a1.
Until 0.2, every Sponsio contract had effectively one failure mode: block the call and let the agent figure it out. That worked for the "AI tried to rm -rf /" demo, but in production it meant brittle agent loops bouncing off refusals every time the policy fired.
0.2 ships three softer landings that keep the agent making progress while still gating the unsafe behavior, plus a few smaller fixes that round out the failure-strategy surface.
What's new
1. tool_policy: default-deny tool access
What it is. A declarative YAML block (or inline kwarg) that says "the agent can only call tools in approved:. Anything else is denied."
tool_policy: default: deny approved: [search, read_file, list_dir]
Why it exists. Adding a new tool to your agent framework would silently expand the agent's authority. With tool_policy, the policy is the single source of truth for what the agent can reach. Adding a tool to your codebase is a deliberate act of trust; you have to put its name in approved: to make it callable.
Why it's good for users.
- Audit-friendly. The allowlist is the artifact you show in a security review. One file, one list, one source of truth.
- Prompt-injection-resistant. Combined with
enforcement: proactive(below), denied tools never reach the agent's prompt. An attacker who tricks the model into asking forshell_execfinds thatshell_execdoes not exist in the model's available tools. - Backwards-compatible. Default is
allow, so existing yaml files keep working byte-for-byte. Users opt in to deny.
2. enforcement: proactive + filter_tools: proactive tool filtering
What it is. Two paths to the same outcome: shrink the tool menu the agent sees down to the subset that is currently legal.
enforcement: proactive(wrap-time). Set ontool_policy. The LangGraph, CrewAI, OpenAI Agents SDK, and Google ADK adapters strip denied tools from the bound toolset atwrap()time. The model literally never sees them.filter_tools(candidates)(per-turn). Pure-probe API on the guard. Returns the subset of tool names that will not be blocked given the live trace. Useful in custom loops where the application owns the LLM call site.
Why it exists. Reactive blocking (the agent tries, gets refused, tries again) wastes tokens and turns. For static rules (default-deny allowlist) the answer does not change between turns; for temporal rules (must_precede(A, B) only allows B after A) the answer changes per turn. Both should be reflected in what the agent sees, not what gets refused on the back end.
Why it's good for users.
- No wasted attempts. The model does not burn turns on tools it cannot actually call.
- Cleaner prompts. Fewer tools in the prompt means fewer distractors and a smaller token bill.
- Works with any framework that supports custom loops.
filter_toolsis the universal hook; the proactive wrap-time variant is the zero-configuration version for the four adapters above. - Side-effect free.
filter_toolsis a pure probe: no log entry, no callback fanout, no perf sample contamination. Safe to call before every model turn.
3. redirect_to_safe: substitute, do not block
What it is. A pattern + strategy combo that, on violation, substitutes the model's chosen tool with a pre-declared safe alternative.
contract("trash instead of rm") .guarantees(redirect_to_safe("rm_rf", "trash"))
The model calls rm_rf; Sponsio rolls that event back from the trace, the LangGraph adapter invokes trash with the same arguments, the trace records the substitute call. From the model's perspective, the call succeeded.
Why it exists. A hard block forces the agent to bail out of the current task. A redirect keeps it making progress on a safer path. Most "destructive vs recoverable" tool pairs (rm_rf vs trash, issue_refund vs log_refund_request, force_push vs open_pull_request) are good candidates for this.
Why it's good for users.
- Agent does not have to learn to recover from policy violations. The recovery is built into the policy.
- Audit trail reflects what actually executed. The trace records the safe substitute, not the attempted-and-blocked unsafe call. Counters (
rate_limit(unsafe, N)) do not tick on the rollback. - Composes with conditional contracts.
assume(...).guarantees(redirect_to_safe(...))makes the substitution conditional on a precondition (for example, redirect refunds over 10ドルk while letting smaller ones through).
4. EscalateToHuman(notify=[...]): notifier hooks
What it is. The escalate strategy now accepts a callable or list of callables (Slack webhook, email sender, oncall pager) that fire synchronously when the contract trips.
EscalateToHuman( reason="refund > 10ドルk requires CFO approval", notify=[slack_oncall, email_finance_lead], )
Why it exists. Until 0.2, EscalateToHuman differed from DetBlock only in the action literal and the agent-facing message. No actual side effect, no notification, no out-of-band reach to a human. 0.2 makes the notification real.
Why it's good for users.
- Isolated failures. A broken Slack webhook does not crash the agent loop and does not silence the remaining notifiers; the exception becomes a
RuntimeWarningnaming the offending callable. - Composable with
DetBlockfor hard refuse + notify. If you want the call gated AND the page fired, pairDetBlockwithmonitor.register_callback. The case study atexamples/integrations/python/v0_2_finance_escalate_vanilla.pyshows the pattern.
Smaller fixes
sponsio mode <observe|enforce>CLI is now parent-aware. Prefers updatingruntime.mode(the only line the TS loader reads), falls back todefaults.mode, refuses to append a freshenforceblock when neither exists. CI scripts that relied on the old exit-1 behavior for malformed configs keep working.- LangGraph adapter rejects chained redirects and self-redirects. A contract that says "redirect A to B" combined with another saying "redirect B to C" no longer silently executes B; both raise
ToolCallBlockedwith a clear chain-naming error. - Pattern factories uniformly accept
desc=. Includingredirect_to_safe, which previously did not and silently broke LLM-extracted rules. - TS SDK gets
redirectToSafe(formula side; runtime strategy bundle is Python-only for now). - Discovery
replay_formulanow passescontent_atomsto grounding. Historical-trace replay against contracts referencingcontains(pii)/arg_has(...)no longer silently returns false negatives. render/components.contracts_tablewraps the name column inText(name). Rich was eating bracketed contract descriptions (only [search, read_file] approved) as malformed markup.
Upgrading
This is an alpha, so pip install sponsio still pulls 0.1.1. To try 0.2.0a1:
pip install --pre sponsio==0.2.0a1
Run the verification script to confirm:
python scripts/verify_v0_2.py
15 checks across core runtime + four adapters. Adapters with the SDK not installed are skipped rather than failed.
Compatibility
- No breaking changes to the 0.1.x API. Every yaml file, every
Sponsio(...)call, every contract factory call from 0.1.1 still works. tool_policy.defaultisallowby default. You opt into deny.enforcementisreactiveby default. You opt into proactive.EscalateToHuman()with nonotify=argument behaves exactly as in 0.1.x.
Real-LLM verification
The v0.2 surface was end-to-end verified against Gemini 2.5 Flash through a LangGraph react agent (not just scripted tool calls). See examples/integrations/python/v0_2_real_llm_refund_langgraph.py for the runnable script.
What the verification confirmed under a real model:
enforcement: proactivestrips the bound tool set in the prompt. The model saw 3 tools (check_policy,issue_refund,log_refund_request), not 4.delete_customerwas completely absent. Prompt-injection attempts to call it have nothing to bind to.redirect_to_safeis transparent to the model. Gemini calledissue_refund(customer_id="C-42", amount=5000), the LangGraph adapter substitutedlog_refund_request, the model read back the ticket-opened result, and adapted its final reply to "Your refund for 5,000ドル has been submitted and is currently under review". The model did not claim a successful refund. It described what actually ran (the substitute call), not the original unsafe call.- Trace integrity. Only
log_refund_requestevents recorded; zeroissue_refundsurvived. Downstream counters and rate limits would see only the substitute call.
The script auto-loads .env from the repo root, so a GOOGLE_API_KEY=AIza... line is all you need:
GOOGLE_API_KEY=AIza... python examples/integrations/python/v0_2_real_llm_refund_langgraph.py
Cross-check with the verification harness for cross-integration sanity:
python scripts/verify_v0_2.py
15 checks across the core runtime and four adapters. Adapters with the SDK not installed skip rather than fail.
Known limitations
redirect_to_saferuntime dispatch is implemented only in the LangGraph adapter. CrewAI / Agent...
Assets 4
v0.1.1 — fix missing pyyaml core dependency
Patch release: pyyaml is now a core dependency (closes #61). A base pip/pipx install shipped without it, crashing 'sponsio host install' with ModuleNotFoundError. Upgrade: pipx upgrade sponsio
Assets 2
v0.1.0 — first stable open-source release
Open-source launch build. Closes the missing-implementation gap in 0.1.0a3
(CLI imported sponsio.daemon / sponsio.plugin.append_ops but the wheel
shipped without them) and tunes the bundled capability rules.
Added
sponsio.daemon— Unix-socket IPC server + client + handlers; powers
the privileged-process side ofsponsio plugin appendso a system install
can give kernel-level (separate-UID) self-modify protection.sponsio plugin append— structurally-additive merge from a staging
YAML into a host bucket library; the only blessed write path through the
self-modify pack.
Changed
- Capability/shell pack — drop session-wide
rate_limit(exec, 50)and
loop_detection(exec, 20). The 24-hour cross-session trace store turned
these into rolling caps that false-positived heavy interactive work; the
targetedarg_blacklistand confirm-gate rules already cover the real
attacks. - Capability/self-modify pack — extend protection to the upstream
sponsiopackage (contract bundles + engine.py) so an editable /--user
/ venv install can't be used as an "edit the bundle to silence the rule"
bypass. Maintainer workflow: override withcustomized: {match: {source: "library:tier1.self-modify"}, disabled: true}. - Onboard wizard — drop redundant trailing "mode flip" hint (axis 3
already asks); language-aware bare-loop guard API hint
(guardBefore/guardAfterfor TS,guard_before/guard_afterfor Python).
Fixed
sponsio --versionwas hardcoded to "0.2.0a0" in the Click
version_option; now readssponsio.__version__so it tracks
pyproject.tomlautomatically.- 0.1.0a3 wheel was missing
sponsio/daemon/andsponsio/plugin/append_ops.py,
causingsponsio plugin appendandsponsio daemon ...to ImportError on a
freshpip install. 0.1.0 ships them.