-
Notifications
You must be signed in to change notification settings - Fork 913
feat(media): pluggable storage backend + S3 implementation#1166
Draft
ilyaseverin wants to merge 3 commits into
Draft
feat(media): pluggable storage backend + S3 implementation #1166ilyaseverin wants to merge 3 commits into
ilyaseverin wants to merge 3 commits into
Conversation
Splits internal/media into: - backend.go: Backend interface (Save / Open / LocalPath / Delete) and a shared ErrNotFound sentinel. - fs_backend.go: FSBackend, the historical filesystem implementation that used to live in store.go. Same key shape, same rename-or-copy semantics; no behaviour change. - store.go: Store is now a thin façade over a Backend. NewStore still returns an FS-backed store, so every existing caller — gateway wiring, agent loop, http handlers, tests — keeps working without changes. New code that wants context-aware access can grab Store.Backend() directly. ExtFromMime stays on Store so both backends agree on the extension that ends up in the object key. This commit is intentionally behaviour-preserving. S3Backend, the config factory, and the bypass-site fixes land in follow-up commits on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
S3Backend persists media objects under {prefix}/{sessionHash}/{id}.{ext}
plus a tiny manifest object at {prefix}/_manifests/{id} that maps the
bare media ID back to its full key. The manifest lets LocalPath / Open
resolve an ID in two GetObjects (manifest + payload) without scanning
the bucket, which would not scale.
Highlights:
- Credentials come from the default AWS SDK chain only (IAM role on
EC2 / ECS / EKS, env vars in dev). No static-key field on Config to
avoid copy-pasted secrets in env files.
- manager.NewUploader does multipart with 10MB parts / concurrency=3,
same posture as internal/backup.
- LocalPath downloads into a process-local cache dir (default
os.TempDir()/goclaw-media-cache), atomic-renames a sibling tempfile
into place so a partial fetch never gets surfaced as a cache hit.
- Delete lists the session prefix, batch-deletes payloads in groups of
1000, then drops the corresponding manifests (best-effort — a
leftover manifest just wastes a few bytes).
- Endpoint override is plumbed through for MinIO / R2 / DO Spaces.
internal/media/config.go introduces Config + LoadConfigFromEnv +
NewBackend + NewStoreFromConfig so cmd/ wiring stays a one-liner.
Default backend is "fs", default baseDir is whatever the caller passes
in — existing deployments see zero behaviour change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cmd/gateway_managed.go switches from media.NewStore(workspace+\"/.media\") to media.NewStoreFromConfig(LoadConfigFromEnv(...)). FS remains the default — operators with no GOCLAW_MEDIA_* env see byte-for-byte the same behaviour they had before — but flipping a single env var now moves session media to S3 without touching code. internal/media/fs_backend_test.go covers the refactor: - Save→LocalPath→Open→Delete round trip, including the source-file consumption contract. - ExtFromMime taking precedence over the source's own extension. - LocalPath returning ErrNotFound (errors.Is-matchable) for stale IDs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ilyaseverin
added a commit
to injectinglabs/goclaw
that referenced
this pull request
May 20, 2026
nextlevelbuilder#1166 (#90) * refactor(media): introduce Backend interface, keep Store as façade Splits internal/media into: - backend.go: Backend interface (Save / Open / LocalPath / Delete) and a shared ErrNotFound sentinel. - fs_backend.go: FSBackend, the historical filesystem implementation that used to live in store.go. Same key shape, same rename-or-copy semantics; no behaviour change. - store.go: Store is now a thin façade over a Backend. NewStore still returns an FS-backed store, so every existing caller — gateway wiring, agent loop, http handlers, tests — keeps working without changes. New code that wants context-aware access can grab Store.Backend() directly. ExtFromMime stays on Store so both backends agree on the extension that ends up in the object key. This commit is intentionally behaviour-preserving. S3Backend, the config factory, and the bypass-site fixes land in follow-up commits on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(media): add S3Backend + env-driven factory S3Backend persists media objects under {prefix}/{sessionHash}/{id}.{ext} plus a tiny manifest object at {prefix}/_manifests/{id} that maps the bare media ID back to its full key. The manifest lets LocalPath / Open resolve an ID in two GetObjects (manifest + payload) without scanning the bucket, which would not scale. Highlights: - Credentials come from the default AWS SDK chain only (IAM role on EC2 / ECS / EKS, env vars in dev). No static-key field on Config to avoid copy-pasted secrets in env files. - manager.NewUploader does multipart with 10MB parts / concurrency=3, same posture as internal/backup. - LocalPath downloads into a process-local cache dir (default os.TempDir()/goclaw-media-cache), atomic-renames a sibling tempfile into place so a partial fetch never gets surfaced as a cache hit. - Delete lists the session prefix, batch-deletes payloads in groups of 1000, then drops the corresponding manifests (best-effort — a leftover manifest just wastes a few bytes). - Endpoint override is plumbed through for MinIO / R2 / DO Spaces. internal/media/config.go introduces Config + LoadConfigFromEnv + NewBackend + NewStoreFromConfig so cmd/ wiring stays a one-liner. Default backend is "fs", default baseDir is whatever the caller passes in — existing deployments see zero behaviour change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(media): wire env-driven backend selection + FSBackend tests cmd/gateway_managed.go switches from media.NewStore(workspace+\"/.media\") to media.NewStoreFromConfig(LoadConfigFromEnv(...)). FS remains the default — operators with no GOCLAW_MEDIA_* env see byte-for-byte the same behaviour they had before — but flipping a single env var now moves session media to S3 without touching code. internal/media/fs_backend_test.go covers the refactor: - Save→LocalPath→Open→Delete round trip, including the source-file consumption contract. - ExtFromMime taking precedence over the source's own extension. - LocalPath returning ErrNotFound (errors.Is-matchable) for stale IDs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
media.Backendinterface and anS3Backendimplementation so goclaw can run on ephemeral hosts without losing session media on restart. The existing filesystem behaviour is preserved exactly —GOCLAW_MEDIA_BACKENDdefaults tofsand every current call site keeps working unchanged.Three commits, intentionally separable for review:
internal/media/store.gointobackend.go(interface),fs_backend.go(the historical implementation, unchanged), and a thinstore.gofaçade. No behaviour change.S3Backendpersists objects under{prefix}/{sessionHash}/{id}.{ext}with a manifest sidecar at{prefix}/_manifests/{id}soLocalPath/Openresolve a bare ID in two GETs. Credentials come from the default SDK chain only (IAM role / env), no static-key field.LocalPathlazily caches intoos.TempDir()/goclaw-media-cachewith atomic rename.Deletepaginates the session prefix andDeleteObjectsin batches of 1000.cmd/gateway_managed.gonow goes throughmedia.LoadConfigFromEnv/NewStoreFromConfig. Adds round-trip / mime-ext / not-found tests for the FS backend.Why
Today
media.Storeis concrete and FS-only. Deployments on Fargate / K8s without persistent volumes / EC2 ASGs that rotate instances drop their session media on every restart. The workaround — pointingbaseDirat EFS/EBS — is heavier than handing the same problem to S3 with a 7-day lifecycle rule.Secondary win: the concrete
*media.Storetype made it impossible to substitute a fake in tests; new code can now depend onmedia.Backenddirectly.Env vars
GOCLAW_MEDIA_BACKENDfsfsors3GOCLAW_MEDIA_BASEDIR{workspace}/.mediaGOCLAW_MEDIA_S3_BUCKETGOCLAW_MEDIA_S3_PREFIXGOCLAW_MEDIA_S3_REGIONus-east-1GOCLAW_MEDIA_S3_ENDPOINTGOCLAW_MEDIA_S3_CACHE_DIR$TMPDIR/goclaw-media-cacheIAM permissions needed for the bucket:
s3:PutObject,s3:GetObject,s3:DeleteObject,s3:ListBucket,s3:DeleteObjects.Test plan
GOCLAW_TEST_S3_BUCKETwould be the natural follow-up; happy to add in this PR if you'd like.injectinglabs/goclaw).Out of scope
workspace/generated/...). That requires a separate workspace abstraction and is orthogonal to session media.POST /v1/files/sign.Open questions
BackendvsStoragevsDriver? I went withBackendto matchinternal/backup. Happy to rename.Savetake anio.Readerinstead ofsrcPath? Current callers all have a local file, sosrcPathkeeps the FS backend's rename-fast-path; reader-based API would force FS to write a tempfile.s3_backend_test.goin this PR, or as a follow-up?🤖 Generated with Claude Code