Are the reasons for why Microsoft states that DB contexts should not be long lived because of measurable effects (memory leak, resource hog, increased probability of data corruption, ...) or is it because of a more subjective reasons (anti pattern, hard to manage multiple instances of the same record, increased complexity in multi threaded applications, ...).
I ask because a lot of work I've identified can be removed if I only had a persistent instance of my DB context keeping track of the objects. I want to know if I shouldn't because the application will suffer or if I shouldn't because inexperience in its implementation would cause the application to suffer.
For reference I'm working on a simple desktop application where 0-10 users at most would be interacting with the DB at any one point in time. 90% of the program is loading in a complex object from the DB (ie. Loading a site object which has multiple building objects which have many floor objects that have many office objects,...) and letting the user CRUD various parts of this object and then push the update to the DB.
Having my site instance and all its children become detached as soon as its done loading requires a lot more work when I later want to push an update.
More focused question if you think the question is to subjective: What are the measurable affects of having a single long lived DB context instance in a single threaded desktop application?
Reference The DbContext lifetime
"The lifetime of a DbContext begins when the instance is created and ends when the instance is disposed. A DbContext instance is designed to be used for a single unit-of-work. This means that the lifetime of a DbContext instance is usually very short."
4 Answers 4
Yes, EF keeps track of the state of entities outside of the database, So if you try to do two things at the same time you will get real runtime errors and unexpected behaviour.
We see in your link:
When EF Core detects an attempt to use a DbContext instance concurrently, you'll see an InvalidOperationException with a message like this:
A second operation started on this context before a previous operation completed. This is usually caused by different threads using the same instance of DbContext, however instance members are not guaranteed to be thread safe.
When concurrent access goes undetected, it can result in undefined behavior, application crashes and data corruption
Now... Most of the time, people will be talking about EF in the context of a web application, which by definition has multiple requests happening at the same time. If your app is a Desktop Application, or a console app, maybe you only ever have one thing happening at a time and won't run into these issues. But it wouldn't be a good idea to do it, as you would be risking these bugs for no reason.
Even if your app is single threaded you can still run into problems. From the same link:
Asynchronous methods enable EF Core to initiate operations that access the database in a non-blocking way. But if a caller does not await the completion of one of these methods, and proceeds to perform other operations on the DbContext, the state of the DbContext can be, (and very likely will be) corrupted.
-
thank you. To reiterate it back to you for my own clarification. The main concern for why it is not recommended is because a programmer might not handle concurrent requests correctly and it can cause unintended behaviors. If a programmer were to handle concurrent requests correctly (ie awaits, locks, ...) than it should be fine to have a persistent DB context. .Mandelbrotter– Mandelbrotter2024年03月26日 21:02:52 +00:00Commented Mar 26, 2024 at 21:02
-
no.. it just can't do concurrent requests. It will throw errors if you tryEwan– Ewan2024年03月26日 21:10:55 +00:00Commented Mar 26, 2024 at 21:10
-
1I would suggest that you add a repository layer that can Upsert your root object. This should assume that everything in the object has changed, its fiddly but you can do it in EF core, without having to check for changes manually, write some tests to prove it works and then use it everywhereEwan– Ewan2024年03月26日 21:37:51 +00:00Commented Mar 26, 2024 at 21:37
-
1stackoverflow.com/questions/42053186/… make sure you use includeEwan– Ewan2024年03月26日 21:40:10 +00:00Commented Mar 26, 2024 at 21:40
-
1its a pattern you can google, but basically you just need a method Upsert(rootObject) { var db = using new DbContext() ... do stuff to make it update object and all childernEwan– Ewan2024年03月26日 21:42:20 +00:00Commented Mar 26, 2024 at 21:42
DbContext is designed to act as a view over piece of database. The piece you are currently interested in. And there are several implementation limitations because of that, for example concurrency is restricted. Which is not needed, because concurrent "tasks" should have their own instance of DbContext.
It is of course possible to redesign it so that it tracks entire database and supports concurrency. But then potentially you have entire database in memory, with the actual database as fallback. Which first of all is hard to do both correctly and efficiently (concurrently). And also this will be too heavy on the app (especially memory). And any benefit goes away as soon as you spawn second application instance (with the same shared database).
I'll give you a concrete bug that you'll experience with a long-lived context.
When EF has an entity in its changetracker, which it fetched from the database, then it will not notice if something changes in the database in the meantime. If at a later stage, you provide it with an update, it will figure out what it needs to update based on comparing the new entity with the old one in the tracker, and it will completely ignore what the actual state in the database is.
Real life anecdote, kept short:
We had this bug occur, where we would fetch the tasks that needed to be handled by an external service, and if they were new tasks, we would send an email to this external service ("hey, do this job for us"). This could take a week before the external party would get back to us. In the meantime, that entry in our database would be updated several times for different reasons, which would include updating its status.
When the message from the external party was received, the same long-lived Windows service that originally contacted the external party would process it and update the entity in our database. The problem is that the EF context was long-lived, so the change tracker still had that entity loaded from last week, with all its week-old values, including the status. We changed another field (i.e. a field made for the response from the external party), saved it... and the week-old status of a newly created ticket was also persisted to the database again, because EF thought that that was the current state anyway.
The source of the issue is that what was once a web service (short-lived, per-request) was turned into a Windows service (long-lived, per-runtime), thus unnaturally extending the life of the EF context within.
This is just one example, but the important takeaway here is that the EF developers designed the context with a short lifespan in mind. Keeping it alive longer than necessary is going to cause undefined behaviors with sometimes incredibly difficult to troubleshoot bugs.
It's not that hard to wrap your context in a factory and just avoid the issue altogether.
-
Isn't this problem supposed to be 'solved' by row versioning/concurrency checks? (Though that just shifts the problem from data corruption to the mildly better inability to save changes) Because even if the context is short lived theres still the risk that something else modifies underneath you.user1937198– user19371982024年03月28日 11:07:13 +00:00Commented Mar 28, 2024 at 11:07
-
@user1937198, what you're touching on is the concept of "transactions" (including the manual transaction control necessary when a client reads and holds data which will ultimately be written back), and "conflict resolution" (as a rule it's not acceptable to just discard data, but instead manual intervention is required where the conflicting states have to be integrated, decisions have to be made by staff about which to discard, or the conflict represents an error and the process is therefore re-engineered to avoid further occurrence).Steve– Steve2024年03月28日 12:55:17 +00:00Commented Mar 28, 2024 at 12:55
The infamous EF.
Is there actually any occasion on which your users would be working on the same "site" model concurrently? And is your application capable of properly handling that?
The reason I ask is because the complicated, multi-level concept of a "building site" you describe, would probably involve enormous complexity in an application if the data which describes it could be edited concurrently from multiple terminals. I mean, what happens when you're adding the curtains and another user wants to abolish the floor?
I find it hard to believe that such a complicated thing would be worked on concurrently by multiple users. Systems that handle architectural drawings, for example, will have the concept of a document "check-out", where a single user reserves the exclusive right to make changes, and then commits them once finished in a "check-in".
If there is no concurrency, you might not need a database at all, or at least could reduce the database to simply hosting a serialised form of the "site" model (which you simply save from memory en-bloc), as opposed to having everything split out into separate database tables and synchronised individually.
The need for any kind of persistent DB connection is almost invariably a flaw in the design of the application, and an insufficient distinction in the design between server-side and client-side, leading to neglect of the fact that network connections can and do fail and servers do go offline.