0

Motivation:

I am building a webserver that is essentially a dashboard app. Users make requests and get analytics data and visualizations back. Multiple users can be in the same "workspace," where they use the same underlying datasource to power dashboards.

In this sense, N users would make requests that use the same instance of an underlying database component, and a different M users might use another instance of that same db component.

Typically, webservers serve individual requests from scratch, but in this case, it seems sensible to have a collection of initialized db instances per-workspace, such that a request from any user in the same workspace would always get that instance (creating db components from scratch for any given request seems heavyweight when I could manage a collection of already-active pools, one pool per workspace). My question is, what kinds of patterns exist for this in webserver backends?

In an attempt to answer my own question, what I've (potentially) landed on is:

# app.py (pseudocode)
app = FastAPI()
@app.post('/{workspace_id}/dasbhoard-plz')
def fetch_dashboard(...):
 # auth the user, etc. then,
 data = data_service.get_data(workspace_id, specific_data_request)
 return dashboardify(data)
# data_service.py
def get_data(workspace_id, specific_data_request):
 sql = turn_into_sql(specific_data_request)
 workspace_datasource = _get_or_init_workspace_datasource(workspace_id)
 data = workspace_datasource.query(sql)
 return data
_cached_workspace_datasources: dict[WorkspaceId, Datasource] = {}
def _get_or_init_workspace_datasource(workspace_id):
 with lock():
 if workspace_id not in _cached_workspace_datasources:
 ds = Datasource.from_workspace_id(workspace_id)
 ds.init()
 _cached_workspace_datasources[workspace_id] = ds
 return _cached_workspace_datasources[workspace_id]
class Datasource:
 def __init__(self, db_url, **engine_kwargs):
 self._engine = create_engine(db_url, **engine_kwargs)
 @classmethod
 def from_workspace_id(workspace_id):
 # get db_url via workspace_id
 cls(db_url)
 def query(self, sql):
 # ...

Essentially, I'm applying a lazy-load/GetOrAdd-style lookup to a wrapper around workspace-level database engines, because requests from multiple users (in the same workspace) shouldn't have to recreate such a thing. The lock is applicable in production setups where you might have multiple threads under the hood, and the workspace_id in dict is not jointly atomic with the dict assignment. I do think this all works, at the cost of process-level duplication of elements in _cached_workspace_datasources in the event prod config is using multiple procs and requests get routed (in)appropriately. However...

Again, my question is, is this kosher? Is there a standard way to do this kind of thing, where some (typically heavyweight) state can be shared across a relevant (subset) of requests?

asked Apr 29, 2024 at 22:39
2
  • Frame challenge: is the database connection really so heavyweight? Can it properly handle requests from multiple threads, also when a second request comes in while you are working on a database transaction of another user? Commented Apr 30, 2024 at 6:31
  • An excellent challenge. A connection itself is not horribly heavyweight, no; it would more precise for me to discuss workspace-level connection pools, which I would argue are heavyweight (and in this context I know to be thread-safe). This said, your point is one worth considering for the general case. If I missed your point entirely, i.e. why bother with maintaining all of these pools at all, well, it certainly seems like I'll end up reimplementing pooling in a roundabout way, but perhaps I'm mistaken. Commented Apr 30, 2024 at 14:22

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.