Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(media): pluggable storage backend + S3 implementation#1166

Draft
ilyaseverin wants to merge 3 commits into
nextlevelbuilder:main from
injectinglabs:feat/media-s3-backend
Draft

feat(media): pluggable storage backend + S3 implementation #1166
ilyaseverin wants to merge 3 commits into
nextlevelbuilder:main from
injectinglabs:feat/media-s3-backend

Conversation

@ilyaseverin

@ilyaseverin ilyaseverin commented May 20, 2026

Copy link
Copy Markdown

Summary

Adds a media.Backend interface and an S3Backend implementation so goclaw can run on ephemeral hosts without losing session media on restart. The existing filesystem behaviour is preserved exactly — GOCLAW_MEDIA_BACKEND defaults to fs and every current call site keeps working unchanged.

Three commits, intentionally separable for review:

  1. refactor(media): introduce Backend interface, keep Store as façade — splits internal/media/store.go into backend.go (interface), fs_backend.go (the historical implementation, unchanged), and a thin store.go façade. No behaviour change.
  2. feat(media): add S3Backend + env-driven factoryS3Backend persists objects under {prefix}/{sessionHash}/{id}.{ext} with a manifest sidecar at {prefix}/_manifests/{id} so LocalPath/Open resolve a bare ID in two GETs. Credentials come from the default SDK chain only (IAM role / env), no static-key field. LocalPath lazily caches into os.TempDir()/goclaw-media-cache with atomic rename. Delete paginates the session prefix and DeleteObjects in batches of 1000.
  3. feat(media): wire env-driven backend selection + FSBackend testscmd/gateway_managed.go now goes through media.LoadConfigFromEnv / NewStoreFromConfig. Adds round-trip / mime-ext / not-found tests for the FS backend.

Why

Today media.Store is concrete and FS-only. Deployments on Fargate / K8s without persistent volumes / EC2 ASGs that rotate instances drop their session media on every restart. The workaround — pointing baseDir at EFS/EBS — is heavier than handing the same problem to S3 with a 7-day lifecycle rule.

Secondary win: the concrete *media.Store type made it impossible to substitute a fake in tests; new code can now depend on media.Backend directly.

Env vars

Var Default Purpose
GOCLAW_MEDIA_BACKEND fs fs or s3
GOCLAW_MEDIA_BASEDIR {workspace}/.media FS only
GOCLAW_MEDIA_S3_BUCKET required when backend=s3
GOCLAW_MEDIA_S3_PREFIX empty key prefix inside the bucket
GOCLAW_MEDIA_S3_REGION us-east-1
GOCLAW_MEDIA_S3_ENDPOINT for MinIO / R2 / DO Spaces
GOCLAW_MEDIA_S3_CACHE_DIR $TMPDIR/goclaw-media-cache local read cache

IAM permissions needed for the bucket: s3:PutObject, s3:GetObject, s3:DeleteObject, s3:ListBucket, s3:DeleteObjects.

Test plan

  • FSBackend unit tests (round-trip, mime ext precedence, ErrNotFound).
  • S3Backend integration: not in the test suite — gated on GOCLAW_TEST_S3_BUCKET would be the natural follow-up; happy to add in this PR if you'd like.
  • Build + existing tests on CI.
  • Smoke on my fork's staging once merged downstream (injectinglabs/goclaw).

Out of scope

  • Workspace-as-S3 (covers user-generated files under workspace/generated/...). That requires a separate workspace abstraction and is orthogonal to session media.
  • Signed download URLs for generated documents — already covered upstream via POST /v1/files/sign.

Open questions

  1. Naming: Backend vs Storage vs Driver? I went with Backend to match internal/backup. Happy to rename.
  2. Should Save take an io.Reader instead of srcPath? Current callers all have a local file, so srcPath keeps the FS backend's rename-fast-path; reader-based API would force FS to write a tempfile.
  3. Add a gated s3_backend_test.go in this PR, or as a follow-up?

🤖 Generated with Claude Code

ilyaseverin and others added 3 commits May 20, 2026 20:49
Splits internal/media into:
- backend.go: Backend interface (Save / Open / LocalPath / Delete) and a
 shared ErrNotFound sentinel.
- fs_backend.go: FSBackend, the historical filesystem implementation
 that used to live in store.go. Same key shape, same rename-or-copy
 semantics; no behaviour change.
- store.go: Store is now a thin façade over a Backend. NewStore still
 returns an FS-backed store, so every existing caller — gateway wiring,
 agent loop, http handlers, tests — keeps working without changes.
 New code that wants context-aware access can grab Store.Backend()
 directly.
ExtFromMime stays on Store so both backends agree on the extension that
ends up in the object key.
This commit is intentionally behaviour-preserving. S3Backend, the config
factory, and the bypass-site fixes land in follow-up commits on this
branch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
S3Backend persists media objects under {prefix}/{sessionHash}/{id}.{ext}
plus a tiny manifest object at {prefix}/_manifests/{id} that maps the
bare media ID back to its full key. The manifest lets LocalPath / Open
resolve an ID in two GetObjects (manifest + payload) without scanning
the bucket, which would not scale.
Highlights:
- Credentials come from the default AWS SDK chain only (IAM role on
 EC2 / ECS / EKS, env vars in dev). No static-key field on Config to
 avoid copy-pasted secrets in env files.
- manager.NewUploader does multipart with 10MB parts / concurrency=3,
 same posture as internal/backup.
- LocalPath downloads into a process-local cache dir (default
 os.TempDir()/goclaw-media-cache), atomic-renames a sibling tempfile
 into place so a partial fetch never gets surfaced as a cache hit.
- Delete lists the session prefix, batch-deletes payloads in groups of
 1000, then drops the corresponding manifests (best-effort — a
 leftover manifest just wastes a few bytes).
- Endpoint override is plumbed through for MinIO / R2 / DO Spaces.
internal/media/config.go introduces Config + LoadConfigFromEnv +
NewBackend + NewStoreFromConfig so cmd/ wiring stays a one-liner.
Default backend is "fs", default baseDir is whatever the caller passes
in — existing deployments see zero behaviour change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cmd/gateway_managed.go switches from media.NewStore(workspace+\"/.media\")
to media.NewStoreFromConfig(LoadConfigFromEnv(...)). FS remains the
default — operators with no GOCLAW_MEDIA_* env see byte-for-byte the
same behaviour they had before — but flipping a single env var now
moves session media to S3 without touching code.
internal/media/fs_backend_test.go covers the refactor:
- Save→LocalPath→Open→Delete round trip, including the source-file
 consumption contract.
- ExtFromMime taking precedence over the source's own extension.
- LocalPath returning ErrNotFound (errors.Is-matchable) for stale IDs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ilyaseverin added a commit to injectinglabs/goclaw that referenced this pull request May 20, 2026
nextlevelbuilder#1166 (#90)
* refactor(media): introduce Backend interface, keep Store as façade
Splits internal/media into:
- backend.go: Backend interface (Save / Open / LocalPath / Delete) and a
 shared ErrNotFound sentinel.
- fs_backend.go: FSBackend, the historical filesystem implementation
 that used to live in store.go. Same key shape, same rename-or-copy
 semantics; no behaviour change.
- store.go: Store is now a thin façade over a Backend. NewStore still
 returns an FS-backed store, so every existing caller — gateway wiring,
 agent loop, http handlers, tests — keeps working without changes.
 New code that wants context-aware access can grab Store.Backend()
 directly.
ExtFromMime stays on Store so both backends agree on the extension that
ends up in the object key.
This commit is intentionally behaviour-preserving. S3Backend, the config
factory, and the bypass-site fixes land in follow-up commits on this
branch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(media): add S3Backend + env-driven factory
S3Backend persists media objects under {prefix}/{sessionHash}/{id}.{ext}
plus a tiny manifest object at {prefix}/_manifests/{id} that maps the
bare media ID back to its full key. The manifest lets LocalPath / Open
resolve an ID in two GetObjects (manifest + payload) without scanning
the bucket, which would not scale.
Highlights:
- Credentials come from the default AWS SDK chain only (IAM role on
 EC2 / ECS / EKS, env vars in dev). No static-key field on Config to
 avoid copy-pasted secrets in env files.
- manager.NewUploader does multipart with 10MB parts / concurrency=3,
 same posture as internal/backup.
- LocalPath downloads into a process-local cache dir (default
 os.TempDir()/goclaw-media-cache), atomic-renames a sibling tempfile
 into place so a partial fetch never gets surfaced as a cache hit.
- Delete lists the session prefix, batch-deletes payloads in groups of
 1000, then drops the corresponding manifests (best-effort — a
 leftover manifest just wastes a few bytes).
- Endpoint override is plumbed through for MinIO / R2 / DO Spaces.
internal/media/config.go introduces Config + LoadConfigFromEnv +
NewBackend + NewStoreFromConfig so cmd/ wiring stays a one-liner.
Default backend is "fs", default baseDir is whatever the caller passes
in — existing deployments see zero behaviour change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(media): wire env-driven backend selection + FSBackend tests
cmd/gateway_managed.go switches from media.NewStore(workspace+\"/.media\")
to media.NewStoreFromConfig(LoadConfigFromEnv(...)). FS remains the
default — operators with no GOCLAW_MEDIA_* env see byte-for-byte the
same behaviour they had before — but flipping a single env var now
moves session media to S3 without touching code.
internal/media/fs_backend_test.go covers the refactor:
- Save→LocalPath→Open→Delete round trip, including the source-file
 consumption contract.
- ExtFromMime taking precedence over the source's own extension.
- LocalPath returning ErrNotFound (errors.Is-matchable) for stale IDs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /