So I've implemented the repository pattern in a lot of projects, but there seems to be a bit of a discussion on what is right with this pattern. Previously, I've always added Update
or Create
methods in side of a (generic) repository to modify the entities. This would always save changes to the context.
Recently I read a blog post about the repository pattern saying that is should act like a collection. It should have a Find()
, GetList()
and Add()
methods, and varieties on those. But it shouldn't save the changes itself. And it should never have an Update()
method because changes to the entities in the collection are references. We should let a Unit of Work save the changes.
Now I don't like the unit of work concept. Because my caller structure usually looks something like this for an average API.
Controller -> Service -> Repository -> DbContext
.
Where my service would modify entities (business logic) and my repository would save changes. The underlying database provider can, and might, change. Below example would be a very basic method inside of my service.
public async Task Update(UpdateBookModel model)
{
var entity = await _repository.Find(model.Id);
// very simple just for the example
entity.Name = model.Name;
entity.ReleaseDate = model.ReleaseDate;
await _repository.Update(entity);
}
I'm starting a new project where I want to use the same structure, but I want my repository to act like a collection. Where the changes are saved somewhere else. This would normally happen in a UoW (according to the blog). But I want to take that away.
I'll have the exact same setup as above, but I'll remove the _repository.Update()
. I'm using DI, where the DbContext
is added scoped. So I thought I might be able to add middleware which will save any changes that happened in my request. Note: I will throw exceptions when something is not as expected, so changed won't be saved when that happens. I have other middleware to catch those exceptions and give the appropriate response.
I like how it make the code cleaner, and takes away responsibility for the service.
So why would this, on a architectural standpoint, be a good or bad decision? Or what is something that I should change?
I did not study any architecture on this matter, so I don't know al the fancy terms. But I am interested in this part of development.
Edit 1
So how would this work?
Because our DbContext
is scoped, it will be the same DbContext
throughout the whole request. This means that it's the same for every repository and also the same in the middleware.
I would just modify / add / remove data like I normally would in my services. I would add middleware that will act after the request is finished. And this will get the DbContext
, which is DI injected, and just call SaveChangesAsync()
. Which will push the local modified data to the database.
After thinking about it a bit more this would work for any API controller, because requests are short and data would be saved frequently.
But there is a problem when working with scopes that are long lasting, like Blazor (server) components. We would not save data here, until the components are disposed.
Because the services live in a shared project, which both an API and Blazor project access. It's a bad idea in those instances, because I would need to build an exception for Blazor projects. But I do still find this idea intriguing and I think there is more to it.
3 Answers 3
It's a bad idea to use The Repository Pattern; it's overused.
(I need to find references for this, I have researched this heavily before, so the references are out there).
It removes "utility". The Repository Pattern manifests as a specific API as you describe { Find, GetList, Add }. If you have an SQL database behind that API, you end up removing the expressiveness of SQL. SQL can JOIN and so much more.
It's meant to only be used for "external" sources, not databases. Does your system query a set of data via a Web Service API? The Repository Pattern is primarily intended for this situation. A standardised API wrapper is ideal, and you are not removing the SQL "utility" because the Web Service never offerred SQL as an option in the first place.
Entity Framework already implements a Unit of Work pattern already
If you're using EF, then you certainly should not use a repository pattern.
In .Net the Entity Framework purposefully implements the Unit of Work pattern. When you create a context, you should dispose that context soon after. Don't hold a global reference to a single context. Don't reuse a context after saving changes (it might cache invalidated data).
Improving your system gradually
Consider using Repository Services only for "commands". Bypass that layer for "querying". Then later, consider using EF directly for "commands", phasing out Repository Services piece by piece over time.
Other direct answers to your comments and questions
I like how it make the code cleaner, and takes away responsibility for the service.
This is a false-positive. Services are supposed to have the responsibility. Code should not be broken into far-flung layers. This makes it much harder to read, while increasing the amount of boilerplate work. It probably feels satisfying to write extra code, but it's not valuable.
Because our DbContext is scoped, it will be the same DbContext throughout the whole request. This means that it's the same for every repository and also the same in the middleware.
That's the incorrect way to use DbContext. It is possible to share a context for a request, but that should be avoided. Make it easier to construct a valid context with a factory function DbContext.Create();
and manage local-scope lifetime with using
. Don't worry, an underlying connection pool helps to make this efficient.
I would add middleware that will act after the request is finished.
I generously assume that:
- there is an underlying driving requirement: that the Response is not delayed by DB (IO) operations.
- you have some sort of HTTP connection limit, and that closing the connection sooner is very important, and that the client doesn't need to know about completion or failure.
- you are measuring your performance and found this to be the biggest contributor to performance problems.
Therefore, you could start SaveChangesAsync
to complete in the background with Task.Run
. (You might run the whole function in a background thread, so Respond sooner)
And this will get the DbContext, which is DI injected, and just call SaveChangesAsync()
This can work.
However, DI of a Unit-Of-Work DbContext can be problematic. Unfortunately Microsoft seem to make that necessary because their internal Authentication modules seem to need that. You can potentially use a shared DI instance, while also using additional local-scoped instances, so that you properly adhere to best practices of usage of DbContext.
Also having any code run invisibly in the background is risky. What happens when "Saving" fails? where is that exception caught? What if there are 3-parts and you need to Save changes 3 times for a Request?
It's tempting to create generalised shared code, and it can feel "better" but I have found that to be a misconception. Imperative code can be seen, read, understood, and debugged.
After thinking about it a bit more this would work for any API controller, because requests are short and data would be saved frequently.
It can, but I don't recommend it.
It's a bad idea in those instances, because I would need to build an exception for Blazor projects. But I do still find this idea intriguing and I think there is more to it.
This is a strong reason for you to avoid the Repository Pattern and DI of DbContexts.
You should get more excited about "visible code". Consider:
void TransferFunds(...)
{
DemandAuthorisedUser(); //Throws an exception if not
var par = DeserializeParameters(); //So much easier to debug than when the ASP.Net auto-deserialization fails
ValidateParameters(par);
using (var db = LocalDatabase.Create())
{
var financialTransfer = new FinancialTransfer();
AssignFromParams(par, financialTransfer);
db.FinancialTransfers.Add(financialTransfer);
db.SaveChanges();
//CreateFinancialTransferAudit(financialTransfer); //Best with RDBMS internal feature instead of middleware
//db.SaveChanges();
}
}
It's all there in one place, and super easy to verify in code-review, and to debug if something goes wrong.
-
Would you be willing to expand on why is it that you think that EF and the repository pattern cannot be used together?edalorzo– edalorzo2021年10月09日 15:27:40 +00:00Commented Oct 9, 2021 at 15:27
-
1@edalorzo they can be used together but they shouldn't. The repository pattern is redundant. The Unit of Work pattern is already in EF, and you already get a standardised DataSet access API from EF. Less is more.Kind Contributor– Kind Contributor2021年10月10日 02:45:33 +00:00Commented Oct 10, 2021 at 2:45
-
Thanks for your answer. Learned something. I don't fully agree about not using the repository pattern. I don't use it in a traditional sense, but I do use it to avoid repetitive code. And to make sure some more advanced queries don't get repeated allowing for a next developer to copy it wrong and give bugs. Also it allows for more unit testable code. I can't mock DbContext but I can mock IEventRepository. I'd like to see some example where you have used your described methods.Roy Berris– Roy Berris2021年10月10日 17:47:23 +00:00Commented Oct 10, 2021 at 17:47
-
@RoyBerris I write database tests with SQL data. The tests get run via CI/CD on Pull Requests. "I'd like to see some example where you have used your described methods" - I have a few projects that should be made Open Source in the coming year. Also have a look at my latest article - medium.com/codex/…Kind Contributor– Kind Contributor2021年10月10日 23:08:20 +00:00Commented Oct 10, 2021 at 23:08
-
I agree that EF already is a unit of work and repository pattern and you don't need an additional one. However, I don't quite agree with "If you have an SQL database behind that API, you end up removing the expressiveness of SQL.". Well, I agree that it is factually correct, but it's glossing over the fact that abstracting SQL is done not to hide its expressiveness but rather to remove a pretty much hard-coded dependency on SQL across your codebase. A repository acts as an encapsulation of whatever db storage technology you use and might want to change at some point.Flater– Flater2021年10月12日 07:51:11 +00:00Commented Oct 12, 2021 at 7:51
So I think as long as your dbcontext is scoped to request this will work.
However! If you think about that, its the same as calling dbContext.SaveChanges() at the end of your Service layer method as required.
Doing this would make more sense, as you know when you need to save and when not, your service layer works outside of the context of the webapp etc etc
The repository pattern is good. But a Generic Repository is bad. EF doesn't really implement either, its a queryable interface.
The problem with the unit of work pattern and generic repositories (ie one repo per table) is that it assumes that all the repositories can handle a transaction which spans all their backing stores. Given you don't know the implementation of the repository when you call it, this obviously cant be true.
Rather than use the Unit Of Work pattern, instead use the idea of Aggregate Roots. An Aggregate Root is an object which can be saved as a whole thing. eg
class CustomerAR
{
Addresses Address[]
}
vs
class Customer {
string[] AddressIds
}
class Address {}
I can always save CustomerAR as an atomic transaction as it contains all the addresses and therfore a Customer Repository presents no problem. If i have Customer with AddressIds and a separate Address Entity then I need to have a transaction or Unit Of Work to ensure consistency.
There really isn't much point in using Repositories + EF. To do so you would have to keep the dbContext internal to the Repo and hide EF away. If you are doing that just use SqlClient.
-
Thanks for your answer. Usually I have a Common project with the business logic, which has an internal DbContext and uses repositories to abstract a way a bit of EF. It's just friendlier for the developers to work with, and it removes a lot of repetitive code. When I don't use a repository and have a pretty advanced query, I don't want developers to copy the Where clause. But instead nesting it in some (reusable) method inside of a repositoryRoy Berris– Roy Berris2021年10月10日 17:51:03 +00:00Commented Oct 10, 2021 at 17:51
-
i agree, i would just not use EF in that repoEwan– Ewan2021年10月11日 09:51:10 +00:00Commented Oct 11, 2021 at 9:51
Repositories as collections?
Recently I read a blog post about the repository pattern saying that is should act like a collection. It should have a Find(), GetList() and Add() methods, and varieties on those. But it shouldn't save the changes itself. And it should never have an Update() method because changes to the entities in the collection are references. We should let a Unit of Work save the changes.
This advice is allround bad. Or, at the very least, it applies to a very specific and very niche subset of cases and should have been given a disclaimer to point this out. You didn't link the article in question so I can't verify whether there is no disclaimer or whether you missed this.
The crux of the issue here is what the repository represents. If it represents a networked data storage, the advice is really, really bad.
If, however, the repository represents an in-memory collection of reference types, then the advice is reasonable, though I still wouldn't particularly state that such a repository must act like a collection.
Let's rewind back to the past. Repositories have a very neat "each type to its own storage" idea. You get people from the PersonRepository
, you get orders from the OrderRepository
, and so on. Organisationally speaking, and from a code perspective, this is a very clean way to organize your in-memory data sources.
But when we started using networked data sources, commonly database servers, we gained the ability to store more data than we can reasonably have in-memory during runtime. That's great, but some repositories banked on the idea of having in-memory access to every entity of given type (all people, all orders, ...).
Once you reach this point, it is no longer feasible to load the entire data set into your memory. It's a waste of both memory and network-bandwidth. It also completely cuts out the performance boosts database servers can offer (indexing, query caching, ...).
This is where the advice you've been given goes off the rails. It only makes sense for in-memory collections. Nowadays, we almost always store our data on a networked resource, and thus the advice is almost always wrong.
Unit of work
The example you use here is logic that only stores a Book
. The important point I'm trying to get at here is that you're only impacting one repository. In such a case, a unit of work is not really very useful.
But I would still advocate for its usage in general.
Units of work start making a lot more sense when you want transactional safety, especially when dealing with multiple entity types at once. For example:
- You want to store multiple books. If any of the books cannot be stored (e.g. validation error, network error, ...), then none of the books should be stored.
- Arguably, this could still be solved without a unit of work, but it effectively requires you to implement extra logic in your repository which you otherwise would've implemented into your unit of work. If you have more than one repository which should provide transactional safety, it already starts making sense to use a unit of work.
- You want to store a new author and their books. If one of these entities (author or any of the books) cannot be stored, then you want none of them to be stored.
As I said, repositories have an "each type to its own storage" design principle. This works well for operations scoped to a single entity type, but it is incapable of dealing with any transaction that involves more than one entity type.
This is why a unit of work is implemented. It is an "overseer" of the repositories. It keeps them in sync, and it coordinates a transaction that involves several repositories.
Some REST APIs always focus on a single entity type. In pure REST, your API effectively mirrors the CRUD functionality of your repository. In such cases, I can agree that a unit of work doesn't really add much value and can be skipped.
However, these kinds of APIs are not very common. Often, you end up resorting to some operations that require multiple entity types, and at this point, a unit of work is needed to coordinate between the repositories.
You can follow YAGNI here and avoid using units of work until the day comes where your codebase performs multi-entity-type operations; but this will require a refactoring of your code when it happens. Personally, I would opt for the unit of work by default here, simply because pure REST APIs that remain pure across their lifetime are exceedingly rare in my experience.
Long-lived scopes
But there is a problem when working with scopes that are long lasting, like Blazor (server) components. We would not save data here, until the components are disposed.
Because the services live in a shared project, which both an API and Blazor project access. It's a bad idea in those instances, because I would need to build an exception for Blazor projects. But I do still find this idea intriguing and I think there is more to it.
I've struggled with this issue in the past, in a different context. Our business logic was written to be consumed by a web api, but was later also implemented in a long-running Windows service, breaking a lot of the repositories' functionality due to contexts living much longer than they should ever have. This led to all sorts of inexplicable behavior and took weeks to troubleshoot.
The solution here is to design your business logic so that it works with a factory.
When you inject a repository (or unit of work, same story) into your service, and your service lives a long life, then your repository (or unit of work) inherently lives for the same amount of time.
public class MyService
{
private readonly UnitOfWork uow;
public MyService(UnitOfWork uow)
{
this.uow = uow;
}
public DoStuff()
{
// logic
}
}
However, this changes when you inject a factory, not a direct instance.
public class MyService
{
private readonly UnitOfWorkFactory uowFactory;
public MyService(UnitOfWorkFactory uowFactory)
{
this.uowFactory = uowFactory;
}
public DoStuff()
{
using(var uow = uowFactory.Create())
{
// logic
}
}
}
public class UnitOfWorkFactory
{
public UnitOfWork Create() => new UnitOfWork();
}
No matter how long your MyService
instance lives, every time MyService.DoStuff
is called, you will be provided with a new, fresh unit of work.
In a web context, it's not harmful. Your factory will just fire once. It's not very useful, but it doesn't harm your workflow either.
In a long-lived context, you now avoid the issue with long lived repositories (or units of work).
GetBooks()
, but something use-case specific, likeGetBooksAwaitingCatalogization()
, possibly with parameters; it's not supposed to be just a renaming of the DBContext API). These then call the same underlying database provider. 2/2