Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: frumu-ai/tandem

Tandem v0.5.13

02 Jun 12:53
@github-actions github-actions

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.13 (2026年06月02日)

Tandem 0.5.13 combines the Linear-backed Coder intake work with a focused
runtime security hardening pass. The release tightens local API exposure,
workspace mutation defaults, shell execution, tenant scoping, audit/event
visibility, secret storage, and browser/provider network guardrails.

Coder Linear Intake

  • The Coder control panel can now register ACA projects backed by Linear teams
    and projects, with optional launch-status, label, and search-query filters.
  • The Coder intake board now renders both GitHub Project items and Linear
    issues through one scheduler-aware issue board, including batch launch,
    active-run detection, and direct issue links.
  • Coder overview and intake refresh messaging now reflect the selected issue
    source, including Linear MCP connection state when a Linear-backed project is
    selected.

Automations V2 Reliability

  • Automations V2 completion is now gated by terminal checkpoint integrity and
    contract-aware deliverable assertions instead of only checking for empty
    pending-node queues.
  • Required file deliverables must exist, be substantive, and pass basic shape
    checks; missing or weak deliverables requeue the owning node while repair
    attempts remain.
  • Required email delivery and generic outbound connector actions now need
    successful receipt evidence before the run can complete. Model prose alone no
    longer satisfies governed side effects.
  • Workflow graph validation now rejects dependency cycles and keeps input_refs
    aligned with readiness dependencies, including through budget compaction and
    strict sequential plans.
  • Verification failures now retry through the repair path until attempt budget
    is exhausted, and verification failure detection is scoped to verification
    output rather than unrelated artifact prose.
  • Recoverable tool execution errors are surfaced to the model for adaptation,
    while cancellation, shutdown, runtime-not-ready, and write-required
    permission failures remain loud failures.
  • Timer-triggered automations now dedupe queued/running runs the same way watch
    triggers do, preventing slow scheduled workflows from accumulating backlogs.
  • Parked-state lifecycle handling is explicit: approval gates can be marked as
    visibly stale under a manual-only policy, guardrail-stopped runs can
    auto-resume after approved quota overrides, stale reaping honors active
    run-registry heartbeats, and node execution uses idle/no-progress timeouts
    with an absolute ceiling.
  • Warning outcomes are now consistent across runtime and learning surfaces:
    accepted_with_warnings remains passable only without unmet requirements,
    but it is not counted as a clean workflow-learning validation pass and does
    not generate positive learning evidence.

Runtime Security Hardening

  • Local engine HTTP API startup now refuses unauthenticated non-loopback binds,
    and token-clearing no longer reopens the API.
  • HTTP MCP registration rejects arbitrary stdio: transports.
  • File write, edit, and patch tools now ask by default instead of silently
    mutating the workspace.
  • Batch sub-calls pass through permission and sandbox evaluation so nested tool
    calls cannot skip approval gates.
  • Workspace and write-policy checks fail closed when no workspace root can be
    resolved.
  • Shell execution uses Linux bubblewrap confinement by default, requires
    workspace context, and requires an explicit unsafe opt-out for unsandboxed
    shell execution.
  • Automation auto-approval now treats empty allowlists as deny-all and refuses
    to auto-approve shell tools.
  • Local single-tenant mode ignores caller-supplied tenant headers; hosted and
    enterprise tenant context continues to require signed assertions.
  • Run event streams, audit streams, and project listing now enforce tenant
    ownership/visibility checks.
  • API tokens, vault keys, and TUI keystores are written with owner-only Unix
    permissions, and vault passphrases replace the previous 4-digit PIN model.
  • Browser navigation fails closed without an allowlist and blocks local/private
    targets; provider base URL validation rejects unsafe remote HTTP endpoints.
  • Provider credential debug output and bug-monitor log redaction now avoid
    leaking plaintext secrets.

Compatibility Notes

  • Linux hosts that need shell execution must have bubblewrap available, or
    explicitly set TANDEM_UNSAFE_UNSANDBOXED_SHELL=1 for trusted local-only
    development.
  • Local clients should rely on the generated/shared API token rather than
    clearing token auth during development.

Full Changelog: v0.5.12...v0.5.13

Assets 15

Tandem v0.5.12

27 May 16:30
@github-actions github-actions

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.12 (2026年05月27日)

Tandem 0.5.12 hardens the hosted control-panel sign-in path and starts the
hosted organization access model inside the runtime. Hosted panel sessions now
carry Tandem-signed org-unit membership, capabilities, and policy-version
context, and automation v2 resources can be private, group-shared, or org-wide.

Hosted Panel Access

  • Hosted panel session exchange and refresh payloads now preserve org units,
    effective capabilities, and policy version from Tandem-hosted identity.
  • /api/auth/me exposes the hosted access context so the control panel can
    filter navigation and actions without treating UI hiding as enforcement.
  • The control-panel proxy now uses capabilities for hosted reads, automation
    execution, automation writes, and sharing actions before forwarding requests
    to the engine.

Runtime Resource Sharing

  • Hosted automation v2 creation stamps owner/private access metadata from the
    verified Tandem assertion instead of trusting caller-provided user fields.
  • Automation v2 list/read/run routes enforce private, group, and org visibility
    in hosted mode. Private automations remain visible to their creator and
    hosted admins.
  • Owners/admins can update automation visibility through
    POST /automations/v2/{id}/share, setting private, group, or org-wide
    access.
  • Automation mutation and repair routes now require owner/admin access for
    hosted resources, while legacy/local unscoped operation remains compatible.

Compatibility

  • Slack, Discord, and Telegram approval-gate integrations continue to call the
    runtime gate-decision path without requiring a browser hosted assertion.

Full Changelog: v0.5.11...v0.5.12

Loading

Tandem v0.5.11

25 May 20:31
@github-actions github-actions

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.11 (2026年05月25日)

Tandem 0.5.11 prepares the hosted enterprise engine distribution path. The
release now builds a Linux x64 tandem-engine binary with browser automation
and enterprise-full routes compiled in, publishes it as a separate enterprise
release asset, and adds a dedicated npm wrapper for hosted sidecar deployments.

Hosted Enterprise Engine

  • Added tandem-engine-enterprise-linux-x64.tar.gz to the Linux release asset
    set. The archive extracts to a normal tandem-engine binary so existing
    hosted sidecar scripts can keep the same command name.
  • Added the public @frumu/tandem-enterprise npm package for Linux x64 hosted
    deployments. Its installer downloads only the enterprise Linux asset and
    fails clearly on unsupported platforms.
  • Kept the standard @frumu/tandem package on the normal public engine asset,
    so desktop and local installs do not pick up enterprise-full dependencies by
    default.
  • Gated automatic enterprise npm publishing behind PUBLISH_NPM_ENTERPRISE=true
    so the 0.5.11 registry publish can ship existing packages first while the new
    npm package is first-published and configured intentionally.

Full Changelog: v0.5.10...v0.5.11

Loading

Tandem v0.5.10

25 May 17:55
@github-actions github-actions

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.10 (Unreleased)

Tandem 0.5.10 opens the enterprise connector source-binding workstream. The
focus is safe ingestion governance for hosted and enterprise deployments:
connector credentials as secret references, source bindings mapped to
ResourceRef and DataClass, quarantine/revoke/rotate workflows,
resource-scoped memory retrieval, and hosted authorization hardening around
who may bind company data into Tandem.

This unreleased line currently contains the contract foundation, enterprise
admin shell, storage-backed organization-unit registry, and storage-backed
source-binding registry. Manual memory imports can now optionally target an
enabled source binding so imported chunks carry resource and data-class
metadata. Google Drive now has a guarded admin-triggered import path; Notion,
GitHub, Slack, Gmail, live OAuth, background workers, and production connector
automation remain follow-up implementation phases.

Enterprise Connector Source Binding

  • Added transport-safe enterprise contract types for connector instances,
    connector lifecycle state, connector credential references, credential
    classes, source binding state, ingestion policy, source objects, ingestion
    jobs, ingestion quarantine, quarantine dispositions, and scoped memory chunk
    references.
  • Added generic organization-unit taxonomy contract types so admins can model
    company-specific domains such as HR, Doctors, Consultants, Claims Adjusters,
    Board Members, or Platform Oncall without hardcoded Tandem roles.
  • Organization-unit memberships can feed ScopedGrant through the new
    organization-unit membership grant source while preserving the existing
    principal/resource/data-class projection model.
  • Added enterprise admin endpoints for org-unit and source-binding management.
    Organization units now have tenant-scoped storage-backed create/list behavior;
    source bindings now have tenant-scoped storage-backed create/list/update
    behavior with ResourceRef tenant validation and admin-gated mutations.
  • Verified hosted context now preserves signed assertion roles so enterprise
    admin mutations can fail closed unless the Tandem-signed hosted context
    carries admin/owner/reconfigure authority.
  • Added a hidden-by-default Enterprise admin page in the control panel. It reads
    the storage-backed org-unit and source-binding endpoints, displays
    tenant/principal context, creates org units and source bindings, and can move
    source bindings between enabled, disabled, and quarantined states.
  • Connector credential references carry only SecretRef metadata and default
    to read-only credentials. They intentionally do not model raw credential
    values.
  • Source bindings map external source roots to Tandem ResourceRef and
    DataClass values, giving manual upload and future external connectors a
    common resource-scoped ingestion contract.
  • Manual memory imports accept an optional source_binding_id, fail closed if
    the binding is outside the tenant or disabled for indexing, and stamp chunks
    with source-binding, resource, data-class, and source-object metadata while
    preserving local/default import behavior when unset.
  • Source-bound vector memory now fails closed before ranking. Chunks stamped
    with enterprise source-binding metadata are hidden unless the caller supplies
    a strict tenant access projection with a matching Read grant for the bound
    ResourceRef and DataClass.
  • Governed/global memory search now applies the same source-binding guard.
    Records with enterprise source-binding metadata are filtered out unless the
    verified tenant assertion carries a strict projection with matching
    resource/data-class read authority.
  • Response-cache entries can now be partitioned by tenant and source binding,
    and invalidated for a specific binding. Source-binding admin changes emit an
    invalidation-required event so future cache consumers can purge stale answers
    after disable, quarantine, revoke, or permission changes.
  • Tool schemas now have additive enterprise security descriptors covering
    required permissions, resource kinds, data classes, admin surfaces, external
    side effects, credential access, and default visibility. Built-in tools emit
    descriptors from their metadata, and unannotated MCP/provider tools can be
    classified conservatively before future discovery masking and execution
    enforcement.
  • The embedded MCP catalog now carries security metadata for servers and
    cataloged tools. Catalog-provided overrides can mark sensitive tools as
    admin/credential/hidden, while unannotated tools receive conservative
    descriptors from catalog context and action classification.
  • Operators can provide JSON/YAML MCP tool-security overrides with
    TANDEM_MCP_TOOL_SECURITY_OVERRIDES_PATH, allowing enterprise deployments to
    tune server and per-tool security descriptors without rebuilding the embedded
    catalog.
  • mcp_list now applies discovery authorization when a signed strict tenant
    projection is present. Unauthorized MCP tools are removed from the discovery
    inventory before the model can see them; local/unscoped discovery remains
    unchanged.
  • Provider/model calls now apply the same strict projection to advertised tool
    schemas before invocation. Unauthorized admin, credential, execute, or
    resource-scoped tools are omitted from the provider-visible tool list while
    local/unscoped sessions remain compatible.
  • Source-bound manual uploads now create durable source-object lifecycle
    records keyed by tenant, source binding, and native object identity. Changed
    documents keep the same source object ID while their hashes update, and
    sync_deletes tombstones removed uploads so later admin workflows can
    reindex, delete, or re-scope by binding/resource.
  • Enterprise admins can now list source-object lifecycle records for a source
    binding, request reindex by purging stale chunks/import index rows, hard
    delete a source object and indexed content, or re-scope its resource/data
    class metadata. Each mutation reuses enterprise admin authorization and emits
    source-binding cache invalidation.
  • The hidden Enterprise admin page now exposes those source-object lifecycle
    rows for a selected source binding and provides reindex, delete, and re-scope
    controls from the hosted admin UI.
  • Hosted/enterprise manual memory imports now require source_binding_id so
    company data is scoped to a ResourceRef and DataClass before indexing.
    Local/default imports can still remain unbound for non-enterprise installs.
  • Local/default manual memory imports can now opt into the generated
    local_manual_upload binding. This gives local installs source-object
    lifecycle tracking under an internal document_collection scope without
    forcing legacy unbound imports to migrate immediately.
  • The enterprise connector trust-proof matrix now has explicit coverage for
    hosted non-admin connector creation denial, source-bound upload
    ResourceRef lifecycle stamping, and same native source IDs remaining
    tenant-scoped.
  • Source-bound memory retrieval now has explicit tenant-isolation proof: tenant
    A cannot retrieve tenant B chunks even when the binding ID, native object
    path, and search phrase overlap.
  • Source-object re-scope now has explicit purge coverage: old indexed chunks
    are deleted before lifecycle metadata moves to the new resource/data class,
    preventing stale prompt context from surviving permission changes.
  • Source-bound prompt-context assembly now has explicit regression coverage:
    source-bound current-session and history chunks are excluded from assembled
    prompt context unless a strict tenant projection grants read access to the
    bound ResourceRef and DataClass.
  • Governed memory list responses now apply the source-bound visibility guard
    before returning metadata, preventing list/citation browser surfaces from
    exposing source-object IDs, native object paths, or binding IDs without a
    strict read grant.
  • Coder governed-memory hit artifacts now fail closed for source-bound records,
    with coverage proving coder memory-hit responses do not expose source-object
    IDs, native object paths, or binding IDs without a strict read grant path.
  • Automation upstream evidence now filters source-bound internal identifiers
    from read paths, discovered paths, and citations before downstream workflow
    nodes can reuse them as citation evidence.
  • Strict session KB grounding now ignores source-bound internal identifiers
    when extracting source labels and document refs, preventing KB citation
    renderers from exposing source-object metadata.
  • Disabling or quarantining a source binding now purges indexed content for its
    lifecycle records and tombstones affected source objects, closing the
    binding-level stale prompt-context path after permission changes.
  • Prompt-context injection and coder duplicate-memory scans now skip
    source-bound governed records by default, closing remaining local/default
    memory caller gaps when no strict grant projection is available.
  • Managed hosted detection now reports hosted auth availability separately, so
    disconnected local test deployments can still use the engine-token sign-in
    path while connected hosted servers keep Tandem hosted login enforcement.
  • Connector instances now have storage-backed tenant-scoped admin endpoints for
    lifecycle management. Source-bound imports require the referenced connector
    to exist and be active, so paused, revoked, or quarantined connectors block
    ingestion before data reaches memory.
  • The hidden Enterprise admin page can now create connector lifecycle records,
    list tenant-scoped connector status, and pause, revoke, quarantine, or
    reactivate connectors from the control panel.
  • Enterprise organization-unit memberships now have tenant-scoped runtime
    storage, admin-gated create/list/update endpoints, and hidden admin UI
    controls f...
Read more
Loading

Tandem v0.5.9

20 May 23:16
@github-actions github-actions
1398745
This commit was signed with the committer’s verified signature.
tacshade TacShade
GPG key ID: 006AED97DDA2979A
Verified
Learn about vigilant mode.

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.9 (Unreleased)

Tandem 0.5.9 continues the hosted tenant-isolation work for Automation V2. The
focus is denial-driven hardening for background and applied automation paths:
scheduled runs, watch-triggered runs, stale recovery, imported/applied
definitions, Automation V2 event visibility, runtime route isolation, provider
and MCP credential boundaries, vector-backed memory partitioning, and the first
coder artifact tenant boundary. The current unreleased work also starts the
workspace access-control contract layer for Google Workspace-style company data
and resource grants.

Enterprise Workspace Access Control

  • Added public enterprise contract vocabulary for organization/workspace/
    department/project/resource hierarchies: ResourceKind,
    ResourcePathSegment, ResourceRef, and ResourceScope.
  • Added access-control vocabulary for View, Read, Edit, Execute,
    Delegate, and Admin, plus data classes such as executive, credential,
    source-code, customer-data, and financial-record scopes.
  • Added normalized principal references for humans, groups, departments, agent
    workers, automations, service accounts, external delegates, and support
    operators.
  • Added GrantSource and ScopedGrant so access can be attributed to direct
    assignment, group membership, department membership, inherited grants,
    explicit executive/global grants, delegated projections, or break-glass
    authority.
  • Added StrictTenantContext, DataBoundary, and AssertionMetadata as the
    additive strict context object for hosted/enterprise projections over tenant
    context, principals, authority chains, resource scopes, grants, data-class
    boundaries, and signed assertion metadata.
  • Added allow/deny grant effects, structured access decisions, and
    StrictTenantContext evaluation helpers so explicit denies win over
    inherited allows, projected resource scopes bound access, expired grants do
    not apply, and project grants can authorize path-scoped resources.
  • Extended Tandem context assertion claims with optional principal,
    resource-scope, scoped-grant, and data-boundary projection fields. Existing
    tenant-only v1 assertions remain valid and deserialize without strict
    projection data.
  • Added a typed enterprise signing-key purpose vocabulary for context
    assertions, approval receipts, delegation projections, A2A peer assertions,
    and break-glass/admin assertions.
  • Added hosted context assertion key metadata checks so keyring entries can
    bind a public key to the context_assertion purpose, org/deployment,
    allowed audiences, allowed resource-scope prefixes, activation windows, and
    active status while preserving legacy string and delimited key formats.
  • Re-exported the new contract vocabulary through tandem-types.
  • Added contract tests covering Finance department data access, Engineering
    repository path scopes, cross-functional group access, CEO org-wide executive
    grants, MCP tool resource targets, expiring delegated vendor-agent access,
    data-boundary denials, project-scoped agent projections, explicit deny
    precedence, expired grants, narrow delegation, scoped assertion projections,
    and legacy assertion compatibility.
  • Added the first hosted control-panel login exchange: managed hosted panels
    redirect users through https://tandem.ac, Tandem-web authorizes hosted org
    membership, the VM exchanges a one-time code with its host-agent token, and
    the browser receives only a panel session while the engine token remains a
    server-side root transport secret.

Hosted Runtime Ingress

  • Hosted and enterprise runtime modes now require a configured deployment
    transport token before accepting requests.
  • Verified hosted context assertions must carry explicit deployment-scoped
    tenant context rather than local_implicit.
  • Context assertion verification now rejects authority chains whose initiating
    actor does not match the signed human actor.
  • Request principals derived from signed context now use the verified assertion
    issuer as their source, preserving the Tandem control-plane trust boundary.
  • Managed hosted control panels now forward Tandem-signed context assertions to
    the engine proxy and hide customer dashboard engine-token reveal for managed
    deployments.

Automation V2 Tenant Isolation

  • Workflow planner apply, mission builder apply, and channel automation draft
    confirm now stamp persisted Automation V2 definitions from the request
    TenantContext.
  • Automation V2 create/apply payloads cannot switch tenant context through
    embedded metadata.
  • Scheduled/background-created runs inherit the stored automation tenant.
  • Watch-condition runs now inherit the owning automation tenant instead of
    falling back to local_implicit.
  • Automation V2 context-run blackboard sync inherits the run tenant, so
    background-created context runs do not silently become local implicit.
  • Stale reaping and auto-resume regression coverage now proves explicit run
    tenant context survives recovery without an active HTTP request.
  • Scheduler-published Automation V2 run-created events now include top-level
    tenantContext, allowing hosted/global SSE filters to enforce tenant
    visibility.
  • Added finite-body Automation V2 SSE coverage proving a tenant stream receives
    its own event and does not receive another tenant's event.

Runtime Tenant Isolation

  • Session routes now enforce tenant ownership for list, get, delete, messages,
    prompting, attach, and workspace-override flows.
  • Global event streams filter emitted events by tenant context so hosted
    tenants do not receive unrelated runtime events.
  • Context-run internal routes are hardened by tenant for events/SSE, blackboard
    access, task claim/transition, checkpoints, replay, and ledger state.
  • Automation V2 run/gate routes reject cross-tenant list, mutation, start,
    inspect, approve, deny, and rework attempts.
  • Legacy workflow routes gained tenant checks so older governance-light paths
    cannot become a bypass around Automation V2 isolation.
  • Coder-created context runs now inherit the request tenant, and coder
    status/list/get/artifact reads are filtered through the linked context run
    tenant.
  • Coder control and artifact-writing routes now require the caller to match the
    owning context run tenant before approving, cancelling, executing, writing
    artifacts, or listing memory candidates. Added denial coverage proving tenant
    B cannot mutate or inspect tenant A's coder run through those routes.

Provider And MCP Secrets

  • Provider credential records are tenant-scoped for hosted/shared runtime mode.
  • Provider create/list/read/update/delete/refresh paths use the request tenant
    and fail closed across tenant boundaries.
  • Store-backed MCP secret references validate tenant scope before lookup.
  • MCP tool execution now receives effective request/session/run tenant context
    so tenant A cannot execute with tenant B's stored MCP secret.
  • Local single-tenant env/store secret behavior is preserved for local mode.

Memory Isolation

  • Governed memory search, list, read, promote, demote, update, and delete paths
    use tenant-aware DB methods.
  • memory_records dedupe and user-created indexes now include tenant scope.
  • Vector-backed session/project/global memory chunks now store tenant
    org/workspace/deployment scope.
  • sqlite-vec top-k memory search filters the chunk table by tenant before
    distance ranking, preventing another tenant's closer vectors from suppressing
    the current tenant's results.
  • Added denial tests for identical vector content, shared source hashes,
    cross-tenant vector search, cross-tenant vector deletes, tenant-scoped memory
    stats, project vector stats, manual clear, and old-session cleanup.
  • Memory manager context retrieval now has tenant-aware APIs that scope recent
    session chunks and vector search before prompt context is assembled.
  • Memory file import/index paths now carry tenant scope through import
    requests, index lookup/update/delete, stale file chunk replacement,
    sync-delete cleanup, project file-index stats, and project file-index clear.
  • Added denial tests proving same project/path imports, identical file chunks,
    index deletes, stats, and clears do not cross tenant boundaries.
  • Memory project/global config rows and old-session hygiene now use tenant
    scope, with tests proving same project ids cannot overwrite retention policy
    or prune another tenant's session memory.
  • Knowledge spaces now include tenant-scoped uniqueness, and knowledge item,
    coverage, promotion, manager, and Automation V2 preflight paths use
    tenant-aware lookups so curated knowledge cannot cross hosted tenant
    boundaries.
  • Existing local memory rows default to local/local during migration.

Automation V2 MCP Diagnostics

  • Required MCP tool validation now reports the exact missing tool ids in
    missing_required_mcp_tools and in required_next_tool_actions, making
    repair prompts specific instead of saying only that required MCP calls were
    incomplete.
  • MCP connector results that return string errors such as MCP error -32602
    are now treated as failed tool results, so invalid connector arguments do not
    satisfy required-tool validation.
  • Structured JSON nodes that declare output_contract.schema now validate the
    final artifact against that schema, so raw MCP account/quota/search payloads
    cannot pass as completed handoff artifacts.
  • Automation V2 node preflight now derives concise MCP tool contracts from the
    offered tool schemas, including required arguments, minimal examples, and
    non-blocking schema warnings that are injected into prompts and diagnostics.
  • MCP contract examples now respect positive minLength constraints for
    required string fields, preventing invalid empty-string examples for tools
    such as Notion search.
  • Structured connector nodes now short-circuit acro...
Read more
Loading

Tandem v0.5.8

17 May 21:15
@github-actions github-actions
5fa062e
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.8 (2026年05月17日)

Tandem 0.5.8 begins the enterprise auth, tenant context, and execution-time
verification implementation. This is the first runtime-facing slice: it adds
provider-agnostic tenant-context contract types and starts carrying tenant
context into tool policy evaluation, while preserving local and single-tenant
behavior by default.

Enterprise Tenant Context Foundation

  • Added explicit runtime auth-mode names for local single-tenant,
    hosted single-tenant, and enterprise-required operation.
  • Added canonical parsing and operator-friendly aliases for the runtime auth
    modes, with TANDEM_RUNTIME_AUTH_MODE resolving to local single-tenant by
    default.
  • Extended the enterprise contract with human actor metadata, deployment-aware
    tenant context, verified tenant-context assertion metadata, hosted tenant
    constructors, authenticated request principals, and request authority-chain
    helpers.
  • Added provider-agnostic tenant context assertion header and claims types for
    the future Tandem-signed JWS passed from tandem-web to runtime/ACA.
  • Re-exported the new contract types through tandem-types for runtime and
    server consumers.
  • Added runtime verification for compact Tandem context assertions signed with
    Ed25519, including public-key configuration, issuer/audience checks, expiry
    checks, and tamper rejection in hosted and enterprise auth modes.
  • Added kid-based context assertion keyring support through
    TANDEM_CONTEXT_ASSERTION_PUBLIC_KEYS / _FILE, with JSON object keyrings
    preferred and the existing single-key env vars preserved as fallback.
  • Added hosted control-plane signer prep in tandem-web: a provider-neutral
    context assertion signer shape, local Ed25519 test signer, and Google Cloud
    KMS Software Ed25519 adapter for future hosted assertions.

Runtime Policy Plumbing

  • ToolPolicyContext now carries the session tenant context into runtime tool
    policy hooks.
  • The engine loop loads the current session tenant before evaluating policy for
    a tool call.
  • Hosted and enterprise runtime auth modes now reject raw tenant/actor headers
    and fail closed unless a configured Tandem-signed context assertion verifies.
  • Verified hosted assertions are attached to request extensions alongside the
    derived tenant context and request principal, giving downstream runtime code a
    trustworthy identity object to consume in later sprints.
  • Fintech strict protected-tool policy now rejects execution when the session
    tenant context does not match the owning Automation V2 run tenant context.
  • In hosted and enterprise auth modes, fintech strict protected tools now fail
    closed if execution reaches the policy hook without a non-local tenant context
    and human actor.
  • Sessions now persist verified tenant assertion metadata and pass it into
    ToolPolicyContext, so strict protected-tool policy can reject expired signed
    tenant assertions at execution time instead of trusting only the original HTTP
    ingress decision.
  • Added regression coverage proving local/default session creation still works
    without hosted auth headers or signed context assertions.

Coder Reliability Upgrade

Tandem Coder now behaves more like a coding supervisor than a generic prompt
runner. Issue-fix work is scheduled as real implementation work, runs in a
managed worktree, requires evidence of code changes and validation, and hands
off through a PR instead of marking project work as done prematurely.

  • Issue-fix worker sessions now use a dedicated coding contract: inspect the
    repository, read scoped instructions, make a plan, patch files, run
    validation, repair failures where possible, and report concise evidence.
  • Strict tool/write enforcement is applied to issue-fix workers, including
    prewrite inspection requirements before mutation. Non-issue-fix worker types
    such as triage, review, and merge recommendation keep their existing
    non-writing execution mode.
  • Managed worktrees are preserved until handoff completes, allowing Tandem to
    collect and expose git diff, changed files, validation output, branch name,
    commit SHA, PR URL, and completion-gate evidence.
  • Completion is now gated: no patch blocks completion, failed validation blocks
    completion, failed push/PR handoff blocks completion, and successful PR
    creation moves the GitHub Project item to Review rather than fake Done.
  • Coder run records now include worker/session ids, worker run ids, managed
    worktree paths, branch/commit/PR metadata, changed files, validation status,
    handoff status, and completion-gate details.
  • Project policy now defaults to PR-required handoff, native Tandem delegation,
    max two parallel issue runs, and no manual out-of-order runs unless the
    project explicitly opts in.
  • GitHub Project intake payloads now include scheduler explanations: parent
    cards, phase, blockers, scheduler rank, runnable state, active run id, run
    state, and handoff URL.
  • Parent cards are treated as planning/grouping headers only. The scheduler
    launches child issues by the lowest open phase and dependency order.

Coder Control Panel

  • The Coder intake view now renders as the primary board: TODO, In Progress,
    Blocked, Review, and Done columns with parent/phase grouping, next-runnable
    badges, disabled run buttons with reasons, and handoff links.
  • Run scheduler next launches only the lowest open phase's scheduler-approved
    child issues. Run selected stays disabled for out-of-order work unless the
    project policy enables manual override.
  • The board shows Tandem's spinner while GitHub Project sync is active instead
    of leaving the intake area visually blank.
  • Active run payloads now carry enough status for the control panel to show the
    current worker session, latest action/log context, changed files, validation
    output, branch, PR link, and failure reason using the existing coder routes
    and event streams.

Boundaries

  • This release does not add Zitadel integration yet.
  • tandem-agents still does not depend on Zitadel or raw IdP tokens.
  • tandem-web is the intended owner of Tandem-signed hosted context
    assertions; runtime and ACA should consume Tandem assertions/public keyrings,
    not raw Zitadel or Google identity tokens.
  • Hosted strict auth is not enabled by default.
  • Local, desktop, and single-tenant workflows continue to run without hosted
    auth, signed context assertions, or approval signing keys.
  • GitHub Project Done remains a post-review/merge state. Coder implementation
    handoff now stops at Review with a PR link and evidence.

Versioning

  • Rust crates, npm packages, Python client metadata, Tauri config, and lockfiles
    are bumped to 0.5.8.

Full Changelog: v0.5.7...v0.5.8

Loading

Tandem v0.5.7

17 May 08:41
@github-actions github-actions
f8f440c
This commit was signed with the committer’s verified signature.
tacshade TacShade
GPG key ID: 006AED97DDA2979A
Verified
Learn about vigilant mode.

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.7 (2026年05月17日)

Tandem 0.5.7 moves the project positioning and first domain-specific runtime hardening toward governed AI infrastructure for enterprise work. The release adds public enterprise runtime docs, a fintech strict-mode foundation for compliance and risk workflows, and the first runtime evidence paths needed to prove that Tandem can govern long-running AI work with scoped tools, citations, approvals, artifacts, audit events, and replayable run records. It also restructures the desktop Coder workspace so the live state of running work is visible at a glance.

Enterprise Runtime Infrastructure Positioning

  • The README now opens with Tandem as governed AI runtime infrastructure for long-running agentic work.
  • docs/AI_RUNTIME_INFRASTRUCTURE.md explains the runtime model: engine-owned state, canonical run journal, task graph, tool/MCP policy, approvals, validators, artifacts, receipts, replay, and enterprise sidecar boundaries.
  • docs/ENTERPRISE_READINESS.md separates what is available now from in-progress and planned enterprise capabilities.
  • docs/ENTERPRISE_PROOF_WALKTHROUGH.md gives platform engineers a repo-grounded path for verifying one governed run from intent through plan, scoped tools, approval, artifact validation, audit evidence, and replay/debug.

Fintech Strict Runtime Foundation

This release adds an internal fintech_strict profile marker for Automation V2 metadata. It is aimed at compliance and risk operations proof sprints, especially compliance/risk update briefs.

What ships now:

  • A protected fintech action classifier in tandem-core for account actions, customer communications, regulatory filings, system-of-record updates, credit decisions, money movement, and evidence publication.
  • Server tool-policy hook enforcement for fintech strict Automation V2 sessions.
  • Protected fintech tools and unknown external mutation tools are blocked with clear denial reasons until an approval path is used.
  • Denied protected fintech actions emit runtime events and protected audit records.
  • /audit/stream maps fintech protected-action denials and verified approvals into admin-readable audit rows.
  • Mission runtime projection ignores metadata.approval.skip_approval for fintech strict nodes, so UI/planner metadata cannot suppress injected approval gates for fintech strict work.
  • Protected fintech tool denials now fail closed with explicit call-site approval/policy verifier status in the denial reason and protected audit payload.
  • Automation gate decisions can now carry protected-action metadata, and fintech strict protected tools are allowed only when a matching approved receipt proves tenant, category, tool, action hash, and non-expired approval at execution time.

Evidence, Citations, And Audit Packages

  • Tool effect ledger summaries now preserve safe source identifiers such as source_id, document_id, ticket_id, and record_id, while still avoiding raw query text.
  • Connector proof helpers only accept successful source retrieval calls as evidence; connector discovery/listing alone is not enough.
  • Existing context-run ledger summaries now include fintech_connector_proof derived from successful source retrieval tool records.
  • Compliance/risk brief validation checks required fields, citations, limitations, reviewer status, approval state, and audit IDs.
  • Explicitly marked fintech brief workflow nodes persist connector proof and validation results in artifact validation metadata, and reject citations that cannot be mapped to connector proof.
  • Workflow plans that explicitly ask for fintech compliance/risk brief artifacts now materialize with fintech_strict runtime metadata and artifact markers by default; generic finance workflows are left alone.
  • An internal audit package helper can assemble run, tenant, actor, tool calls, connector proof, artifacts, approvals, and policy decisions from Automation V2 run state.
  • The assembled fintech audit package can be persisted as a context-run artifact for compliance-review handoff.
  • eval_datasets/fintech_compliance_risk.yaml adds proof-sprint fixtures for unsupported claim rejection, connector selected-but-unused rejection, protected-action bypass attempts, cross-tenant source denial, and incomplete evidence surfaced as limitations.
  • Eval runner spec mapping now carries fintech runtime profile, tenant, and artifact-contract metadata into generated Automation V2 specs for stub/live evals.

Coder Workspace UX

The desktop Coder panel was rebuilt so the state of running work is visible at a glance and the intake flow is no longer dominated by setup chrome. The previous panel showed task status as a tiny outlined pill in the corner of each card, hid awaiting-approval prompts inside a tab, duplicated the project picker between a stat box and a separate card, and kept the GitHub Project intake fully expanded even after a binding was saved. The redesigned layout makes "what's running, what needs me, what failed" the dominant signal and reduces the amount of scrolling needed before a coding swarm is launched or inspected.

  • Live status badges with animated indicators: New CoderRunStatusBadge renders run status as colored chips — Running (primary spinner + pulse dot), Queued (primary pulse), Needs approval (amber + pulse), Paused (amber), Failed (red), Cancelled (muted), Completed (emerald). Used on every run card in the list and at the top of the run detail card so the running/queued/awaiting state is the first thing the eye lands on. A run's status tone now also drives the color of the progress bar (amber when paused or awaiting, red on failure, emerald on completion, primary while running).

  • Always-visible runs summary strip: New CoderRunsSummary component at the top of the Runs view tallies Running / Needs approval / Paused / Failed / Completed across the workspace and shows a live "Updated Xs ago" indicator that ticks every 15 seconds (so the relative time stays fresh between sidecar event-driven refreshes). The summary surfaces totals even when individual runs scroll off-screen and emphasizes attention categories so they remain visible at a consistent spot.

  • Step progress on every card and the detail header: New CoderRunProgress component draws a thin progress bar plus completed / total (and blocked) counts derived from each run's checkpoint node IDs. The bar appears on each list card under the status banner and at the top of the detail card next to the status badge, so a run's actual position in its workflow is visible without expanding the Context tab.

  • Elevated awaiting-gate prompts: When a run is waiting on an operator decision, the detail card now shows the prompt title, instructions, and Approve & continue / Request rework buttons in an amber alert at the very top of the card — above the action toolbar — instead of in the Overview tab's "Gate State" panel. List cards for awaiting runs grow a matching amber "Waiting on you: ..." banner so the same signal is visible in the list without selecting the run.

  • Consolidated project context header: The Coder page header now embeds ProjectSwitcher directly and shows the detected git slug / current branch / default branch as a subtitle (with a short "Detecting git repo..." hint while resolution is in flight). The previous "Active Project" stat box, the standalone "Project Context" card, and the four-stat "User Repo Context" card are gone — the same information is in one place, taking ~1/4 the vertical space.

  • Tab pills with attention counts and smart default tab: The Create / Runs tabs are now accent-pill buttons. The Runs tab shows a badge with the count of active or failed runs and switches to an amber tone when any run needs approval (red when any failed) so the operator can spot work that needs them from the Create tab. On first load, the page auto-defaults to Runs when the workspace has any active runs and stays on Create otherwise, instead of always landing on Create.

  • Collapsing GitHub Project intake: GitHub Project binding and inbox UX moved into a dedicated CoderGithubProjectPanel component. When no project is bound, the connect form (Owner + Project Number + Connect) is the only thing in the card. Once bound, the configuration collapses to a single-line Connected · owner #N summary with Refresh and Change buttons, and status mapping (TODO / In Progress / In Review / Blocked / Done) plus saved/live schema fingerprints move behind an "Advanced" disclosure that is closed by default. Inbox items render in a tighter row layout with linked issue numbers and the primary action reads Pull into Coder.

  • Dev-noise sections removed: The "First Slice" and "Compatibility" stat boxes from the original Coder header card, the "Selected preset ... is UI scaffolding in this slice" copy under the Mission Builder, and the always-open DeveloperRunViewer ("Legacy Compatibility") at the bottom of the Runs view are all gone. The legacy inspector now lives behind a collapsed "Legacy coder inspector" disclosure so it is one click away when needed but no longer dominates the Runs view.

The Coder restructure is pure UI: no changes to the tandem-agents API surface, the Tauri command surface, the Automation V2 contract, the coder metadata schema, or the GitHub Project MCP tools. Saved coder templates, saved GitHub Project bindings, and the existing run detail tabs (Overview, Transcripts, Context, Artifacts, Memory) continue to work unchanged. Internally, new shared helpers (runStatusTone, runIsActive, runProgress, relativeTimeFromMs) in coderRunUtils.ts let the list, detail, summary, and progress components classify status through one code path.

Boundaries

  • No public HTTP API changes were added for fintech strict mode.
  • This is not a production-ready regulated fintech deployment claim.
  • fintech_strict is an int...
Read more
Loading

Tandem v0.5.6

15 May 23:09
@github-actions github-actions
c217ad3
This commit was signed with the committer’s verified signature.
tacshade TacShade
GPG key ID: 006AED97DDA2979A
Verified
Learn about vigilant mode.

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.6 (2026年05月14日)

AI Evaluation Framework

This release ships the complete AI Evaluation Framework (Phases 1-5): a production-ready system for structured testing, regression detection, and compliance documentation of AI quality. The framework enables automated quality gates that prevent AI performance regressions from reaching production, provide quantitative proof of AI safety practices for compliance audits (e.g., EU AI Act Article 50 transparency), and make it easy for teams to experiment with new AI features while maintaining quality baseline.

What ships now:

  • Failure Mode Taxonomy (AIFailureMode enum with 30+ variants): Formalized categorization of AI failures across validation (artifact validation, contract violations, citation missing, web sources missing), provider (timeout, rate limit, model not found, stream failures), repair (budget exhausted, loop detection, unavailable), resource (token exhausted, cost exhausted, memory exhausted), authorization (failed, invalid API key, permission denied), and system-level (config error, dependency missing, feature disabled) domains. Each failure is classified into a severity category (Critical, High, Medium, Low) for incident response prioritization. Includes deterministic classifiers (classify_error_text(), categorize_failure(), should_retry()) for automatically tagging failures from real error text.

  • Evaluation Dataset Format (YAML/JSON): Structured EvalDataset schema with EvalTestCase definitions, AutomationSpecTest node specifications, and configurable EvalExpectedOutput validation criteria. Four example datasets ship in eval_datasets/: critical_path.yaml (happy-path scenarios for core features), provider_failures.yaml (resilience tests for timeouts and rate limits), repair_exhaustion.yaml (budget depletion edge cases), and citation_validation.yaml (web research validation). Dataset format is version-controlled and extensible for adding new test scenarios.

  • Eval Runner CLI (cargo run --bin eval-runner): Standalone command-line tool for bulk test execution with the following capabilities:

    • Metrics aggregation: Computes pass_rate, avg_repair_iterations, total_cost_usd, avg_cost_per_test, provider_failure_rate, and per-validator-class pass rates from test results
    • Simulation mode (--simulation): Deterministic test execution without making AI provider calls, making evals safe to run in CI and cost-free during development
    • Parallel workers (--num-workers): Configurable parallelism for faster test suite runs
    • Tag filtering (--filter-tag): Run only tests matching a specific tag (e.g., regression, happy_path)
    • CLI arguments: --dataset (required), --output, --provider, --model, --max-duration, --verbose
    • Output format: Human-readable summary on stdout + structured JSON to file for programmatic CI consumption
    • Exit codes: 0 (all tests passed), 1 (one or more test failures), 2 (dataset load error or invalid arguments)
  • Regression Detection System: Baseline comparison infrastructure that prevents AI quality degradation. EvalBaseline stores snapshots of evaluation metrics with git commit/branch metadata. detect_regressions() function compares current run metrics against a saved baseline using configurable thresholds (default: 5 percentage point pass_rate drop, 20% cost increase, 30% repair iteration increase, 5 percentage point provider failure increase). Results are reported as RegressionStatus (Pass / Warning / Regression) with human-readable messages explaining the deltas.

  • Regression Detection CI Gate (.github/workflows/eval-regression-gate.yml): GitHub Actions workflow that:

    • Triggers on every PR against main/develop and on main branch pushes
    • Builds and runs eval-runner against critical_path.yaml dataset
    • Compares results to eval_baselines/main_branch.json baseline
    • Posts a summary comment on the PR with pass rate and test counts
    • Fails the check if any regression threshold is exceeded (protecting main from quality degradation)
    • Auto-updates the baseline on successful main branch push so future PR comparisons use the latest production metrics
    • Uploads eval results as CI artifact for debugging and audit trail
  • Developer Documentation (docs/dev/EVAL_FRAMEWORK.md, ~500 lines): Comprehensive guide including:

    • Quick start with CLI examples
    • Architecture overview and data flow diagrams
    • Step-by-step guide for adding new eval datasets
    • Threshold customization patterns
    • Failure mode taxonomy reference
    • Running tests locally and in CI
    • Troubleshooting common issues (model differences, rate limits, nondeterminism)
    • FAQ covering simulation mode, parallelization, validators, costs, baseline updates
  • User & Compliance Documentation (docs/user/AI_QUALITY_ASSURANCE.md, ~350 lines): Non-technical guide covering:

    • Core quality metrics explained for end users (pass rate, repair iterations, cost, provider reliability)
    • Test scenarios described (happy path, edge cases, regression tests)
    • Quality assurance process (automated regression detection, continuous monitoring, failure analysis)
    • EU AI Act Article 50 compliance: Explicit mapping of Tandem's testing practices to transparency requirements for natural persons interacting with the AI system
    • Trust model: What Tandem can and cannot guarantee, limitations of AI systems
    • Incident response: How issues are detected, investigated, and resolved
    • Support contacts and glossary

Quality Gate Features

The regression-detection workflow is designed to be low-friction and CI-friendly:

  • Default thresholds are conservative but tunable per team/metric
  • Simulation mode means evaluations complete in seconds with zero API cost
  • Pass/Warning/Regression statuses provide clear go/no-go signals for CI gates
  • JSON output integrates with custom CI/CD tooling for advanced use cases
  • PR comments make quality trends visible to the whole team

EU AI Act Article 50 Compliance

This framework explicitly supports Article 50 transparency obligations for hosted Tandem services:

  • Documented AI system: The framework demonstrates how AI components are systematically tested
  • Quality assurance: Automated gates and metrics prove ongoing quality practices
  • Failure categorization: 30+ failure types enable post-mortem root-cause analysis
  • Performance tracking: Metrics track AI quality before, during, and after deployment
  • Regression prevention: Automated safeguards block degradations from reaching users
  • Audit trail: All test results are timestamped and logged for compliance audits

Users and auditors can request detailed evaluation metrics, regression reports, and failure analysis logs via support channels.


Security

  • Authorization fix for approval gates: Slack, Discord, and Telegram approval interactions now fail closed unless the acting user resolves through the configured channel allowlist before approval, rework, or cancel decisions are processed.

  • TOCTOU race condition fix in automation run cache: Automation run state reloads now detect concurrent in-memory updates before accepting disk-loaded state, preventing stale cache loads from overwriting gate decisions or duplicating execution.

  • Path traversal protection for automation identifiers: Automation definition and run-history paths now sanitize identifier-derived filenames and verify resolved paths stay inside their intended state roots.

  • Dedup TTL for webhook replay prevention: Discord and Slack interaction deduplication now uses a bounded retry window, reducing stale replay risk while preserving normal platform retry handling.

  • File permission validation on startup: Startup now warns when sensitive state files have overly broad Unix permissions so operators can tighten local storage access.

  • Discord modal identifier validation: Discord rework modal submissions now reject malformed or incomplete identifiers before any gate decision is dispatched.

  • Telegram dedup TTL implementation: Telegram approval callbacks now use the same retry-window deduplication model as Discord and Slack, reducing stale callback replay risk.

  • User ID extraction: reject instead of default: Channel approval handlers now reject malformed requests without a resolvable acting user instead of assigning a placeholder identity.

  • Reason field size validation: Discord rework feedback is now bounded server-side before being stored with gate decisions.

  • Error message information disclosure prevention: Public channel rejection messages now use generic denial text while retaining detailed audit logs for operators.

  • JWT structure and algorithm validation: Codex identity token parsing now validates token shape, header presence, allowed algorithm behavior, and signature encoding before processing claims.

  • JSON merge recursion depth limit: Provider configuration merging now enforces a maximum nesting depth to avoid stack exhaustion on deeply nested input.

  • CODEX_HOME path validation: Codex CLI home resolution now rejects unsafe or system-sensitive paths and falls back to the default home directory with a warning.

  • JWT token expiration validation: Codex identity resolution now rejects tokens without valid expiration claims and bounds-checks expiration timestamps before time arithmetic.

  • Approval card delivery fan-out: Slack, Discord, and Telegram channel adapters now support native interactive card sends. Approval requests can render as Block Kit messages, Discord embeds with components, or Telegram inline-keyboard messages instead of plain text fallbacks.

  • Slack approval card lifecycle updates: The server records delivered approval message handles in `...

Read more
Loading

Tandem v0.5.5

13 May 14:39
@github-actions github-actions
a6a6ad5
This commit was signed with the committer’s verified signature.
tacshade TacShade
GPG key ID: 006AED97DDA2979A
Verified
Learn about vigilant mode.

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.5 (2026年05月13日)

This release lays down the Execution Profiles foundation — a runtime governance toggle (Strict / Guided / YOLO) that will let users keep working while validators and contracts continue to harden, without abandoning Tandem's runtime ownership of state, receipts, replay, spend tracking, and approvals. The motivation is operational: full governance still has a high run-fail rate as bugs are ironed out, and a meaningful share of those failures are over-strict (false-positive validation, missing-but-non-essential sections, recoverable artifact issues) rather than real defects. Execution Profiles are the structured bridge that lets affected runs continue with the relaxation captured in receipts, so the data we collect can drive validator classes back to Strict-by-default once they mature.

The v0.5.5 cut is backend telemetry-only. Strict, Guided, and YOLO runs all produce identical run outcomes today; the only difference is in receipts. This is intentional. The status-downgrade behavior change (where Guided actually warns instead of blocking, and YOLO actually continues as experimental) is gated on the next slice, which can calibrate against the validator-class telemetry collected here. No existing automation changes behavior in this release.

What ships now:

  • Type foundation (automation_v2::execution_profile): ExecutionProfile enum (strict/guided/yolo), ValidatorClass taxonomy with is_relaxable_in(profile) and a conservative is_critical() allowlist for never-relaxable classes (auth, secret access, destructive-action approval, budget caps, kill switch, deterministic verifier failures). decide_profile_validation is the single chokepoint; augment_output_with_profile_relaxation is the executor-facing helper; classify_unmet_requirement maps existing validator strings to the taxonomy.
  • Run record and API: AutomationExecutionPolicy.profile is now optional and persisted. Every AutomationV2RunRecord carries typed effective_execution_profile and requested_execution_profile. POST /automations/v2/{id}/run_now accepts an optional execution_profile override (Strict, Guided, or YOLO) that applies for the single run only without mutating the saved automation. resolve_effective_execution_profile enforces a deterministic precedence: run override → workflow policy → Strict.
  • Lifecycle and event observability: record_automation_lifecycle_event_with_metadata automatically merges the run's effective_execution_profile into every AutomationLifecycleRecord so existing audit, replay, and Bug Monitor surfaces see the profile without per-call-site changes. The automation_v2.run.failed engine event now includes both effective_execution_profile and requested_execution_profile, so Bug Monitor and downstream observers can attribute failures to the active profile.
  • Executor chokepoint (telemetry-only): The executor invokes augment_output_with_profile_relaxation at the single run-acceptance moment. When every unmet_requirement on a node output is relaxable under the active profile, it writes relaxed_validator_classes (structured), effective_outcome, original_validator_outcome, execution_profile, and experimental: true (YOLO) into the artifact_validation block. Strict runs are unchanged. Critical classes (destructive-action approval, budget cap, etc.) always block; if any classification is unknown, the augmentation conservatively skips so behavior stays Strict-equivalent.
  • 24 unit tests covering serde round-trip, default-to-Strict, critical-class blocking, soft-class relaxation per profile, tenant-denylist enforcement, classifier mapping, augmentation purity, and lifecycle metadata merge semantics.

What is intentionally deferred to follow-up slices and tracked in docs/internal/execution-profiles/KANBAN.md:

  • Phase 4b: status-downgrade behavior change so Guided actually warns and YOLO actually continues as experimental, gated on telemetry calibration.
  • Phase 5: wiring the existing effective_repair_budget multiplier (1.0 / 1.5 / 2.0 by profile) into the repair-decision call sites.
  • Phase 6: control-panel UI (profile selector, run pill, experimental badge).
  • Phase 7: Tauri desktop UI (matching control panel).
  • Experimental-input propagation rule for downstream nodes.
  • Tenant-level relaxation denylist and default-profile administration.

This patch keeps automation-owned runtime sessions out of the user Chat session list without hiding their audit trail from the rest of Tandem.

Sessions now carry explicit source metadata. New interactive sessions default to sourceKind: chat, Automation V2/Bug Monitor worker sessions are classified as automation_v2, and session listing supports filtering by source. The TypeScript client and wire model expose the same fields so control-panel views can ask for the session class they actually need.

The Chat sidebar and Dashboard recent-session list now request only source=chat, so Bug Monitor submissions such as Automation automation-v2-bug-monitor-triage-failure-draft-... / inspect_failure_report no longer appear as conversations. Legacy automation records with the existing title format are classified at the storage/wire boundary, preserving backward compatibility for already-written sessions.

The Tauri desktop Automation Calendar no longer crashes the app while loading. FullCalendar is now isolated into its own lazy bundle and imported only after the WebKit stylesheet host is ready, preventing the Cannot read properties of null (reading 'cssRules') startup failure seen when opening the calendar view.

Bug Monitor GitHub issue creation now uses a persisted pending idempotency claim before calling GitHub. Completion finalization, stale-provider recovery, deadline recovery, and status-sweep recovery can all wake up around the same draft, but only the first caller that claims the create-issue digest is allowed to create the GitHub issue. Concurrent callers now see publish_in_progress or reuse the posted record instead of producing duplicate issues with the same fingerprint and triage run.

Bug Monitor proposal quality gates also recognize the structured handoff shapes that triage nodes actually return, including wrapped objects such as { "bug_monitor_inspection": ... } and array responses containing the artifact followed by a compact status object. Placeholder task specs still fail the gate, but valid completed inspection, research, validation, and fix-proposal artifacts no longer get treated as missing and replaced with broad fallback evidence.

Bug Monitor triage status detection now treats nested status: blocked fields inside structured Bug Monitor handoffs as evidence/limitation data, not as the node's own runtime status. This prevents propose_fix_and_verification from recursively blocking the debugger when it has produced a useful partial fix proposal with acceptance criteria and bounded next steps.

Automation V2 long-running nodes now get to own their timeout path. The stale-run reaper honors the run-registry heartbeat that active node execution already emits every few seconds, so a first task with a 600-second budget is not globally paused as stale_no_provider_activity at the exact timeout boundary before the node can fail or repair normally.

Automation V2 research validation now preserves source URLs from successful websearch and webfetch tool results. If a generated JSON artifact is too sparse and omits raw links, the validator can still see the current web evidence that was actually gathered instead of blocking the node as citations_missing. The prompt and repair guidance also now explicitly tell research agents to include raw URLs in citations or web_sources_reviewed fields.

Connector-backed source research now has to use the selected connector, not merely discover it. A node that says to use Reddit MCP and resolves reddit-gmail can no longer complete after only mcp_list plus a JSON write; it must call a concrete source tool such as mcp.reddit_gmail.reddit_search_across_subreddits or mcp.reddit_gmail.reddit_retrieve_reddit_post, preserving real returned evidence or an actual connector/tool limitation.

The prompt and tool surface now reinforce that rule before validation has to catch it. Connector source prompts list concrete mcp.* tools and state that mcp_list, glob, grep, edit, and apply_patch are not source evidence, while non-code connector source nodes no longer offer edit/patch/bash tools that can distract agents from calling the connector.

Connector-backed delivery nodes now keep their destination MCP tools focused all the way through artifact creation. Notion save/report nodes with explicit mcp.notion.* tool allowlists no longer inherit generic workspace read/glob or mutation tools from upstream input refs, but they still retain the required write tool for the run artifact receipt. The engine loop also narrows prewrite MCP gating to the specific concrete connector tools that have not yet run, steering a Notion publisher from notion_fetch to notion_create_pages instead of letting it loop on already-completed discovery or local inspection.

Required-tool provider calls now fail closed inside Tandem instead of being rejected by the provider when routing filters remove every tool. Write-required connector nodes keep the artifact write tool even when their session allowlist is connector-only, and if a later filter still produces an empty tool set Tandem downgrades the provider request away from tool_choice: required rather than sending an invalid no-tools request.

Transient provider stream decode failures are now treated as recoverable provider infrastructure failures. Stream errors such as error decoding response body, unexpected EOF, and incomplete streamed responses are retried inside the current provider iteration with partial stream...

Read more
Loading

Tandem v0.5.4

05 May 14:36
@github-actions github-actions
e199ef7
This commit was signed with the committer’s verified signature.
tacshade TacShade
GPG key ID: 006AED97DDA2979A
Verified
Learn about vigilant mode.

Choose a tag to compare

See the assets below to download the installer for your platform.

v0.5.4 (Released 2026年05月05日)

This patch fixes automation schedule timezone handling, tightens the distinction between local source-code research and final research synthesis, and introduces marketplace-ready workflow pack import/export.

Automation cron schedules now preserve the selected local wall-clock time end to end. The server accepts the 5-field cron expressions emitted by the control panel, normalizes them for the Rust cron parser, and evaluates them in the saved IANA timezone when computing next_fire_at_ms. The control panel now carries that timezone through guided schedule summaries, creation review, workflow editing, calendar labels, and standup scheduling, with Europe/Budapest available in the common timezone picker. A regression test covers weekday 9:00 AM in Budapest resolving correctly through DST-aware UTC storage.

Final report/brief nodes that synthesize already-collected Tandem MCP notes, Reddit MCP signals, web findings, and run artifacts no longer require fresh workspace read calls. The planner stops adding local_source_reads to new research_synthesis contracts, and the runtime validator waives stale local-read enforcement on existing saved synthesis nodes. Code-change, local-research, and Bug Monitor source-inspection nodes still retain their strict repo-read gates.

This prevents research-to-destination workflows from blocking with messages such as research brief cited workspace sources without using read when the workflow only cites MCP/web/upstream artifact evidence and does not need repository source files.

Workflow packs are now the preferred portable format for created workflows. The Workflows page can upload a .zip pack, preview its manifest, cover image, workflow entries, capabilities, and validation results, then install it and open the resulting planner session. Raw JSON workflow bundle import remains available under Advanced for debugging and internal handoffs.

Planner sessions can also be exported as marketplace-ready workflow pack ZIPs containing tandempack.yaml, README.md, the embedded workflow plan bundle, and an optional PNG/JPEG/WebP cover image. New workflow-pack APIs and TypeScript client helpers support export, preview, and import, while imported sessions keep pack provenance (source_pack_id, version, and source bundle digest) for later inspection.

Exported workflow packs now include a hosted-safe download URL, and the Workflows page shows a browser Download ZIP action after export so operators can retrieve generated packs without access to the server filesystem path. Control-panel uploads also now prefer $TANDEM_HOME/data/channel_uploads and expand home-directory placeholders such as ~, $HOME, ${HOME}, and %HOME%, avoiding stray literal upload directories when hosted or Windows-style environment values are used on Linux/macOS.

Full Changelog: v0.4.44...v0.5.4

Loading
Previous 1 3 4 5 12 13
Previous

AltStyle によって変換されたページ (->オリジナル) /