Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[async 8/8] test: reproduce the Flask-era lazy-load Sentry issues with polymorphic data#482

Closed
somethingnew2-0 wants to merge 7 commits into
pcollins/async-3-fanout-errors from
pcollins/async-4-polymorphic-coverage
Closed

[async 8/8] test: reproduce the Flask-era lazy-load Sentry issues with polymorphic data #482
somethingnew2-0 wants to merge 7 commits into
pcollins/async-3-fanout-errors from
pcollins/async-4-polymorphic-coverage

Conversation

@somethingnew2-0

@somethingnew2-0 somethingnew2-0 commented Jun 12, 2026
edited
Loading

Copy link
Copy Markdown
Collaborator

Regression battery for the four Flask-era Sentry issues (ACCESS-FLASK-4N, ACCESS-FLASK-44, ACCESS-FLASK-4E) — the InvalidRequestError: ... lazy='raise_on_sql' production failures that motivated demoting five relationships to lazy="select", which #477 flips back to raise_on_sql.

Why they never reproduced locally: raise_on_sql raises identically under test, so the only production-only ingredient was data shape. The failing serializations required polymorphic subtypes in nested positions — a role group whose associated groups are app groups, tagged groups inside request refs, app groups inside membership rows — which the suite never built. The error chain in the reports maps exactly onto the schema nesting: RoleGroup.active_role_associated_group_member_mappings (4N) → each mapping's active_group (44) → that group's app when it's an AppGroup (4E), plus active_group_tags (4E) on every rich group ref.

This battery builds that shape once (tagged role group with member+owner associations to a tagged app group and a tagged plain group, via-role memberships, pending access/role requests targeting both group types) and walks every read surface: groups list + detail per type, roles, users, access-request detail with role-group and app-group targets, role requests, tags, apps, and /api/audit/users + /api/audit/groups both unfiltered and with the user_id/group_id filters the old model comments singled out. Assertions verify the nested payloads are populated, not just 200 — so both failure modes are covered: a missing eager load fails loudly instead of paging on-call, and a silently-empty collection fails instead of shipping wrong data.

Result: all 17 scenarios pass against the current loaders — the eager-load topology from the FastAPI migration plus #477's fixes closes every gap behind the four Sentry issues. Full suite: 536 passed.

PR stack

Merge bottom-up; each PR is based on the branch of the one before it (retarget to main as predecessors merge).

  1. [async 1/7] deps: prepare for async SQLAlchemy #475 — async dependency prep (SQLAlchemy[asyncio] 2.0.50, asyncpg, aiosqlite)
  2. [async 2/7] refactor: modernize the legacy Query API to 2.0-style statements #476 — legacy Query API -> 2.0-style statements (~350 sites)
  3. [async 3/7] refactor: eliminate lazy loading and commit-expiration from the ORM layer #477 — eliminate lazy loading and commit-expiration from the ORM layer
  4. [async 4/7] refactor: move operation DB resolution out of __init__ into _resolve() #478 — operation DB resolution out of __init__ into _resolve()
  5. [async 5/7] refactor: keep db.session access out of spawned notification tasks #479 — keep db.session access out of spawned notification tasks
  6. [async 6/7] Switch to async SQLAlchemy end-to-end #480 the async flip (production + tests + docs)
  7. [async 7/7] fix: surface failed Okta/notification fan-out tasks instead of dropping them #481 — surface failed Okta/notification fan-out tasks
  8. [async 8/8] test: reproduce the Flask-era lazy-load Sentry issues with polymorphic data #482 — regression battery for the Flask-era lazy-load Sentry issues ⬅️ this PR

🤖 Generated with Claude Code

somethingnew2-0 and others added 7 commits June 12, 2026 07:55
Every operation class now stores raw constructor arguments and resolves
them (queries, row locks, id lookups) in a _resolve() method invoked as
the first statement of execute() / execute_for_group() /
execute_for_role(). __init__ signatures and public attribute names are
unchanged, and every call site already chains .execute() immediately,
so behavior is identical - resolution just happens at execute time.
Async SQLAlchemy prep: __init__ cannot be async, so all session work
must live behind execute() before the flip makes it awaitable. For the
four operations whose execute() wraps asyncio.run(self._execute()),
_resolve() runs before asyncio.run on the calling thread.
Side benefit: the with_for_update row locks in the approve/reject
operations now begin at execute time rather than construction time,
which narrows the lock window to the actual mutation.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ModifyGroupUsers and ModifyRoleGroups previously created asyncio tasks
whose coroutines queried db.session (requester lookup +
get_all_possible_request_approvers) before invoking the notification
hook. Under async SQLAlchemy, ContextVar-scoped sessions are inherited
by child tasks, and concurrent operations on one AsyncSession raise
InvalidRequestError - so spawned tasks may only perform network I/O.
The approve helpers now just mutate and return the request; after the
final commit, the main coroutine resolves requester/approvers/group
sequentially and spawns hook-only coroutines.
Also fixes a latent bug in ModifyRoleGroups: its pending-request
queries never eager-loaded AccessRequest.requested_group /
RoleRequest.requester_role / RoleRequest.requested_group (all
raise_on_sql), so the notify task raised - and asyncio.wait silently
swallowed it - whenever an approved request had been created in a
different session than the one approving it. The notifications now
always have their relationships loaded. resolved_at (expired at flush
after the func.now() assignment) is explicitly refreshed so hooks see
a concrete timestamp.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The atomic flip (POST_MIGRATION_TODO items 2, 3, 11):
- extensions.py: async_sessionmaker + async_scoped_session keyed on the
 existing _session_scope ContextVar (item 3). The db.session global
 proxy now yields an AsyncSession; remove()/create_all()/drop_all()
 are coroutines. ContextVar scoping (not asyncio.current_task) is
 load-bearing: BaseHTTPMiddleware.call_next runs handlers in a child
 task, and context copies propagate where task identity would not.
- database.py: create_async_engine with a driver-rewrite map so
 existing deployment URLs (postgresql://, postgresql+pg8000://,
 sqlite://) keep booting on asyncpg/aiosqlite; CloudSQL goes through
 connector.connect_async + asyncpg IAM auth as the async_creator.
 get_db is an async generator; DbSession yields AsyncSession.
- okta_service.py (item 11): every method is natively async under its
 primary name; the asyncio.run-per-call sync wrappers and duplicated
 async_* variants are gone. One event loop end-to-end; fresh
 per-call SDK clients retained (CLI and tests run their own loops).
- operations/routers/auth/syncer/integrity/MCP: async def + awaits
 throughout; routers paginate via apaginate; the four fan-out
 operations keep their create_task Okta parallelism on the request
 loop. Spawned tasks perform network I/O only - never db.session
 (concurrent use of one AsyncSession is illegal).
- Sync plugin hooks (deferred to item 18) are bridged: session-bound
 app_group_lifecycle hooks run via AsyncSession.run_sync (greenlet
 bridge), notification/conditional-access hooks via
 anyio.to_thread.run_sync so plugin network I/O cannot block the
 event loop.
- manage.py: Click commands stay sync; _with_app_context drives the
 async body with one asyncio.run per invocation and disposes the
 engine it created. Plugin-contributed CLI commands that touch
 db.session synchronously will need the same treatment (docs follow).
- migrations/env.py: async-template (connection.run_sync) over the
 same engine builder, so CloudSQL migrations keep working; pg8000
 dropped from requirements.
Tests are converted in the next commit; mypy/ruff/import/alembic
gates are green at this point.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
POST_MIGRATION_TODO item 15. pytest-asyncio 1.4.0 in asyncio_mode=auto
with function-scoped event loops (config in tox.ini [pytest]); the
'coroutine was never awaited' warning is escalated to an error as a
missed-await tripwire.
- conftest: the db fixture builds an aiosqlite StaticPool in-memory
 engine (TEST_DATABASE_URI still supported, legacy driver names
 normalized to asyncpg); table create/drop via conn.run_sync; engine
 disposed per test so the aiosqlite worker thread closes. The client
 fixture is a DatetimeAwareAsyncClient over ASGITransport
 (raise_app_exceptions=False and follow_redirects=True replicate the
 old TestClient defaults). AsyncClient is required, not stylistic:
 requests must run on the test's own loop because the engine is bound
 to it.
- ~440 tests are async def with awaited client/session/operation
 calls; Query.count() sites go through tests/helpers.db_count.
 Okta mock targets renamed to the merged method names; patches on
 async methods auto-create AsyncMock so spy assertions are unchanged.
- test_okta_service's concurrency test now asserts per-call SDK client
 isolation across asyncio.gather on one loop (the per-thread-loop
 design it used to pin down no longer exists). test_mcp wraps request
 phases in the app's lifespan context so the MCP session manager task
 group is running. test_oidc's cookie/redirect clients carry over
 verbatim on httpx.
- Tests share one session with the app under test, so identity-map
 state expired by request rollbacks or fetch-synced bulk updates is
 reloaded explicitly (expire_all/refresh) at the staleness points -
 attribute access on an expired object is sync lazy IO, which
 AsyncSession correctly refuses.
- Two spy-count assertions in the group sync tests widened by the okta
 method-name merge (the spy now also sees ModifyGroupUsers' internal
 removals); system behavior unchanged.
Full suite: 515 passed - same count as the sync baseline. ruff and
mypy clean across 164 files.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- README + k8s example manifests document plain postgresql:// URIs
 (legacy +pg8000 names are normalized to asyncpg by the engine
 builder); README notes asyncpg's ssl= parameter handling and the
 statement_cache_size=0 requirement behind transaction-mode PgBouncer
- db-migrations workflow matrix uses the plain postgresql:// URI
- POST_MIGRATION_TODO: items 2, 3, 11, and 15 are complete and removed;
 new 11a (Okta SDK v3 upgrade - kept at 2.9.8 behind the OktaService
 facade for this migration) and 11b (shared pooled Okta client via
 lifespan); item 18 documents the sync-hook bridging now in place
 (run_sync for session-bound lifecycle hooks, anyio.to_thread for
 notification/conditional-access hooks) and the async rules for
 plugin-contributed CLI commands
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
...ng them
The four fan-out operations (ModifyGroupUsers, ModifyRoleGroups,
DeleteUser, DeleteGroup) drained their asyncio.create_task lists with
asyncio.wait, which does not propagate task exceptions - a failed Okta
call or notification hook vanished silently, so partial Okta failures
during membership changes were invisible.
They now drain through drain_fan_out_tasks (gather with
return_exceptions=True), which logs every failure at ERROR with the
operation context and traceback. Failures still do not fail the
request: the local DB state is committed before the tasks are awaited,
so raising would turn an Okta straggler into a misleading request
error; the syncer reconciles Okta drift on its next run.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
...c data
ACCESS-FLASK-4N (RoleGroup.active_role_associated_group_member_mappings),
ACCESS-FLASK-44 (RoleGroupMap.active_group), and ACCESS-FLASK-4E
(AppGroup.app, OktaGroup.active_group_tags) were production
InvalidRequestError lazy='raise_on_sql' failures that never reproduced
in tests - raise_on_sql raises identically under test, so the missing
ingredient was data shape: the failing serializations required
polymorphic subtypes in nested positions (role groups whose associated
groups are app groups, tagged groups inside request refs, app groups
inside membership rows) that the suite never built. The five
relationships involved were demoted to lazy="select" as a band-aid;
the async migration flips them back to raise_on_sql, so this battery
builds the production shape once and walks every read surface - groups
list/detail per type, roles, users with via-role app-group memberships,
access-request detail with role-group and app-group targets, role
requests, tags, apps, and the audit endpoints filtered and unfiltered.
Assertions check that the nested payloads are populated (mappings ->
active_group -> app, tags on every row type), so both failure modes are
covered: a missing eager load fails loudly instead of paging on-call,
and a silently-empty collection fails instead of shipping wrong data.
All 17 scenarios pass against the current loaders - the eager-load
topology closed every gap behind the four Sentry issues.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copy link
Copy Markdown
Collaborator Author

Folded into #477 per review feedback: the battery now lands as a sync test in the same PR that returns the five relationships to raise_on_sql (and the suite-conversion commit in #480 carries its async form), with the internal incident identifiers removed.

🤖 Posted by Claude Code

@somethingnew2-0 somethingnew2-0 deleted the pcollins/async-4-polymorphic-coverage branch June 12, 2026 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /