In practice, Cognito-issued tokens represent Cognito sessions, not a full replacement for long-lived third-party API credential lifecycle management in this architecture. So we changed direction:
- Use direct OAuth code exchange with the provider.
- Persist provider access/refresh tokens in encrypted parameter storage.
- Keep DynamoDB focused on non-sensitive token metadata and discovery records.
This decision reduced ambiguity in token ownership, made refresh behavior explicit, and aligned better with least-privilege backend API access patterns.
Resource discovery pipeline
After the callback exchange, the auth Lambda performs discovery against two provider APIs and stores normalized records.
# pseudo-code
tokens = exchange_code_for_tokens(code, redirect_uri)
email = fetch_user_identity(tokens.access_token)
store_tokens(email, tokens.access_token, tokens.refresh_token)
resources_a = list_resource_type_a(tokens.access_token)
resources_b, skipped = list_resource_type_b(tokens.refresh_token)
upsert_resources_a(email, resources_a)
upsert_resources_b(email, resources_b, skipped)
Notice the mixed token strategy:
- API A accepts an access token directly.
- API B may be better served from refresh-token-derived sessions.
This small detail matters in real-world provider ecosystems.
New problem: operational analytics lived in Postgres, not DynamoDB
V1 solved onboarding and discovery. Then a second problem appeared.
Downstream consumers (dashboards, joins, historical analysis, role-based reports) relied on relational querying in Postgres. But fresh data now landed in DynamoDB first.
We had to answer:
- How do we keep relational tables synced with minimal lag?
- How do we remain idempotent under retries and duplicate events?
- How do we avoid expensive full-table scans every minute?
Solution v2: event-driven sync with DynamoDB Streams
The best fit was an event-driven projection layer:
- DynamoDB table updates emit stream records.
- Stream processor Lambda transforms records.
- Lambda upserts rows into Postgres.
Updated architecture with streaming sync
DynamoDB stream for RDS Postgres Sync
Why event-driven first, batch second
Primary path (event-driven):
- Low latency (seconds)
- No frequent scans
- Natural fit for change-data-capture style projection
Safety net (nightly reconciliation):
- Catches rare drift (missed events, temporary DB outage, mapping regressions)
- Supports audit checks and backfills
This is a practical engineering pattern: fast path + correctness path.
Stream processor design details
1. Idempotency via SQL upsert
DynamoDB Streams are at-least-once delivery. Duplicate records can happen. Upsert semantics make retries safe.
-- pseudo-SQL
INSERT INTO ext_resource_a (
user_email,
resource_id,
resource_name,
status,
updated_at
)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (user_email, resource_id)
DO UPDATE SET
resource_name = EXCLUDED.resource_name,
status = EXCLUDED.status,
updated_at = EXCLUDED.updated_at;
2. Record-level routing
# pseudo-code
for rec in event.records:
table = detect_source_table(rec)
if rec.event_name in ["INSERT", "MODIFY"]:
row = map_new_image_to_row(table, rec.new_image)
upsert_postgres(table, row)
elif rec.event_name == "REMOVE":
soft_delete_or_mark_inactive(table, rec.keys)
3. Preserve source truth semantics
Not every delete should be a physical delete in Postgres. Often better:
- Keep row
- Mark
status = inactive
- Track
synced_at and source_updated_at
This improves auditability and historical reporting.
4. Backpressure and failure handling
For production, configure:
- Batch size tuned for row payload
- Retries + DLQ (or failure destination)
- Per-table metrics for lag and failure counts
# pseudo-SAM fragment
EventSourceMapping:
Type: DynamoDB
Properties:
StartingPosition: LATEST
BatchSize: 100
MaximumRetryAttempts: 3
BisectBatchOnFunctionError: true
Security-by-design decisions (and why)
CSRF-safe OAuth state
Single-use, TTL-bound nonce in DynamoDB reduced callback forgery risk.
Token isolation
Only the secret store contains token values. The metadata table stores "token exists" and consent timestamps.
Least privilege IAM
Each Lambda role should have only:
- read/write specific DynamoDB tables it uses
- limited SSM parameter path access
- CloudWatch log permissions
- network access only when needed (sync Lambda inside VPC for RDS)
Logging hygiene
Never log:
- authorization code
- access token
- refresh token
- raw provider error objects that may contain sensitive context
Log instead:
- operation outcome
- provider endpoint class
- masked subject identifiers
- correlation ID
Cost-aware architecture choices
The design intentionally kept fixed costs low:
- Lambda for bursty orchestration
- API Gateway for managed ingress
- DynamoDB on-demand for uncertain traffic
- SSM Parameter Store SecureString instead of a heavier secret system for this phase
- 30-day log retention to control CloudWatch growth
Rough POC economics can stay small (single-digit USD/month) when traffic is modest and retention is disciplined.
Sequence walkthrough (problem to resolution)
Sequence walkthrough of 0auth secure event-driven onboarding serverless app with AWS
Small implementation snippets you can adapt
Sanitize identity for secret path keys
# pseudo-code
import re
def to_secret_path_segment(identity: str) -> str:
return re.sub(r"[^A-Za-z0-9._-]", "_", identity)
Build the callback URI dynamically to avoid template coupling
# pseudo-code
def callback_uri_from_event(event):
domain = event.requestContext.domainName
stage = event.requestContext.stage
return f"https://{domain}/{stage}/auth/callback"
Separate "active" and "skipped" discovered resources
# pseudo-code
active, skipped = discover_resources()
upsert_active(active)
upsert_skipped(skipped, reason_field="skip_reason")
Keep a reconciliation watermark
-- pseudo-SQL
CREATE TABLE sync_checkpoint (
pipeline_name text primary key,
last_reconciled_at timestamptz not null
);
Design trade-offs and what changed in architecture
What improved from v1 to v2
- Onboarding became self-service instead of support-driven.
- Token handling became boundary-safe and auditable.
- Metadata became immediately queryable in DynamoDB.
- Relational consumers received near real-time updates via streams.
- Operational resilience improved with nightly reconciliation.
New complexity introduced (and accepted)
- Stream processor deployment and monitoring.
- VPC networking for Lambda-to-RDS connectivity.
- Schema mapping ownership between NoSQL and SQL models.
These are acceptable because they buy reliability, lower manual effort, and a better consumer experience.
What this case study intentionally does not reveal
To protect privacy and commercial implementation details, this post excludes:
- Real account names, tenants, domains, and identifiers
- Production table names and environment values
- End-to-end source code and full function implementations
- Internal business workflows, SLAs, and organization-specific goals
That is not a weakness. It is a publishing discipline.
Final architecture summary
Final design in one sentence:
A serverless OAuth ingestion service writes secure secrets and normalized metadata, then projects metadata changes into Postgres through an idempotent event-driven stream processor, with scheduled reconciliation for correctness.
If you are designing a similar platform, the key pattern to remember is:
- Keep secrets and metadata in separate trust boundaries.
- Use event streams for freshness.
- Add periodic reconciliation for confidence.
- Design cost and security as first-class constraints, not post-launch patches.
Practical rollout checklist
- Ship OAuth nonce validation and one-time consumption first.
- Enforce token/metadata split before production traffic.
- Add provider discovery with partial-failure handling (active vs skipped).
- Enable streams on metadata tables.
- Implement the idempotent Postgres upsert projection.
- Add nightly reconciliation and drift metrics.
- Lock down IAM and log redaction rules.
- Track cost and lag dashboards from day one.
Closing note
Architecture maturity usually arrives in stages, not all at once. First, you remove manual pain. Then you harden security boundaries. Then you solve data movement with event-driven design. If you do those steps intentionally, you can stay both secure and cost-effective while your system grows.
Resources
OAuth 2.0 Authorization Framework (RFC 6749): datatracker
OAuth 2.0 Threat Model and Security Considerations (RFC 6819): datatracker
AWS Lambda Developer Guide: docs.aws.amazon.com/lambda
Amazon API Gateway Developer Guide: docs.aws.amazon.com/apigateway
Amazon DynamoDB Developer Guide: docs.aws.amazon.com/amazondynamodb
DynamoDB Streams and Lambda event source mappings: docs.aws.amazon.com/lambda
AWS Systems Manager Parameter Store (SecureString): docs.aws.amazon.com/systems-manager
AWS Well-Architected Framework (Security and Cost pillars): docs.aws.amazon.com/wellarchitected
Amazon RDS for PostgreSQL User Guide: docs.aws.amazon.com/AmazonRDS
PostgreSQL INSERT ... ON CONFLICT (UPSERT): postgresql/sql-insert
Python requests documentation: requests.readthedocs