What happens behind the scenes: NHibernate, Linq to SQL, Entity Framework scenario analysis
One of the things that I began doing since starting to work on multiple OR/M Profilers is to compare how all of them are handling a particular task. This is by no means a comparative analysis, but it is an interesting data point.
The scenario in question is loading a blog with all its posts and comments.
Let us start with NHibernate:
var blogs = s.CreateQuery( @"from Blog b left join fetch b.Posts p left join fetch p.Comments where b.Id = :id") .SetParameter("id", 1) .List<Blog>();
Will generate the following SQL
select blog0_.Id as Id7_0_, posts1_.Id as Id0_1_, comments2_.Id as Id2_2_, blog0_.Title as Title7_0_, blog0_.Subtitle as Subtitle7_0_, blog0_.AllowsComments as AllowsCo4_7_0_, blog0_.CreatedAt as CreatedAt7_0_, posts1_.Title as Title0_1_, posts1_.Text as Text0_1_, posts1_.PostedAt as PostedAt0_1_, posts1_.BlogId as BlogId0_1_, posts1_.UserId as UserId0_1_, posts1_.BlogId as BlogId0__, posts1_.Id as Id0__, comments2_.Name as Name2_2_, comments2_.Email as Email2_2_, comments2_.HomePage as HomePage2_2_, comments2_.Ip as Ip2_2_, comments2_.Text as Text2_2_, comments2_.PostId as PostId2_2_, comments2_.PostId as PostId1__, comments2_.Id as Id1__ from Blogs blog0_ left outer join Posts posts1_ on blog0_.Id = posts1_.BlogId left outer join Comments comments2_ on posts1_.Id = comments2_.PostId where blog0_.Id = 1 /* @p0 */
This result in a fairly simple query plan:
However, you should note that this also result in a Cartesian product, which may not be what you wanted.
Linq to SQL doesn’t really provide a good way to express what I wanted, but it does get the job done:
var dataLoadOptions = new DataLoadOptions(); dataLoadOptions.LoadWith<Blog>(x => x.Posts); dataLoadOptions.LoadWith<Post>(x => x.Comments); using (var db = new BlogModelDataContext(conStr) { LoadOptions = dataLoadOptions }) { db.Blogs.Where(x => x.Id == 1).ToList(); }
Interestingly enough, this does not generate a single query, but two queries:
-- statement #1 SELECT [t0].[Id], [t0].[Title], [t0].[Subtitle], [t0].[AllowsComments], [t0].[CreatedAt] FROM [dbo].[Blogs] AS [t0] WHERE [t0].[Id] = 1 /* @p0 */ -- statement #2 SELECT [t0].[Id], [t0].[Title], [t0].[Text], [t0].[PostedAt], [t0].[BlogId], [t0].[UserId], [t1].[Id] AS [Id2], [t1].[Name], [t1].[Email], [t1].[HomePage], [t1].[Ip], [t1].[Text] AS [Text2], [t1].[PostId], (SELECT COUNT(* ) FROM [dbo].[Comments] AS [t2] WHERE [t2].[PostId] = [t0].[Id]) AS [value] FROM [dbo].[Posts] AS [t0] LEFT OUTER JOIN [dbo].[Comments] AS [t1] ON [t1].[PostId] = [t0].[Id] WHERE [t0].[BlogId] = 1 /* @x1 */ ORDER BY [t0].[Id], [t1].[Id]
The interesting bit is that while there are two queries here, this method does not generate a Cartesian product, so I have to consider this a plus. What I would like to know is whatever this is intentionally so or just a result of the way Linq to SQL eager loading is structured.
The query plan for this is simple as well:
Finally, Entity Framework (both 3.5 and 4.0), using this code:
db.Blogs .Include("Posts") .Include("Posts.Comments") .Where(x => x.Id == 1) .ToList();
This code will generate:
SELECT [Project2].[Id] AS [Id], [Project2].[Title] AS [Title], [Project2].[Subtitle] AS [Subtitle], [Project2].[AllowsComments] AS [AllowsComments], [Project2].[CreatedAt] AS [CreatedAt], [Project2].[C1] AS [C1], [Project2].[C4] AS [C2], [Project2].[Id1] AS [Id1], [Project2].[Title1] AS [Title1], [Project2].[Text] AS [Text], [Project2].[PostedAt] AS [PostedAt], [Project2].[BlogId] AS [BlogId], [Project2].[UserId] AS [UserId], [Project2].[C3] AS [C3], [Project2].[C2] AS [C4], [Project2].[Id2] AS [Id2], [Project2].[Name] AS [Name], [Project2].[Email] AS [Email], [Project2].[HomePage] AS [HomePage], [Project2].[Ip] AS [Ip], [Project2].[Text1] AS [Text1], [Project2].[PostId] AS [PostId] FROM (SELECT [Extent1].[Id] AS [Id], [Extent1].[Title] AS [Title], [Extent1].[Subtitle] AS [Subtitle], [Extent1].[AllowsComments] AS [AllowsComments], [Extent1].[CreatedAt] AS [CreatedAt], 1 AS [C1], [Project1].[Id] AS [Id1], [Project1].[Title] AS [Title1], [Project1].[Text] AS [Text], [Project1].[PostedAt] AS [PostedAt], [Project1].[BlogId] AS [BlogId], [Project1].[UserId] AS [UserId], [Project1].[Id1] AS [Id2], [Project1].[Name] AS [Name], [Project1].[Email] AS [Email], [Project1].[HomePage] AS [HomePage], [Project1].[Ip] AS [Ip], [Project1].[Text1] AS [Text1], [Project1].[PostId] AS [PostId], CASE WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int) ELSE CASE WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END END AS [C2], CASE WHEN ([Project1].[C1] IS NULL) THEN CAST(NULL AS int) ELSE CASE WHEN ([Project1].[Id1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END END AS [C3], [Project1].[C1] AS [C4] FROM [dbo].[Blogs] AS [Extent1] LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id], [Extent2].[Title] AS [Title], [Extent2].[Text] AS [Text], [Extent2].[PostedAt] AS [PostedAt], [Extent2].[BlogId] AS [BlogId], [Extent2].[UserId] AS [UserId], [Extent3].[Id] AS [Id1], [Extent3].[Name] AS [Name], [Extent3].[Email] AS [Email], [Extent3].[HomePage] AS [HomePage], [Extent3].[Ip] AS [Ip], [Extent3].[Text] AS [Text1], [Extent3].[PostId] AS [PostId], 1 AS [C1] FROM [dbo].[Posts] AS [Extent2] LEFT OUTER JOIN [dbo].[Comments] AS [Extent3] ON [Extent2].[Id] = [Extent3].[PostId]) AS [Project1] ON [Extent1].[Id] = [Project1].[BlogId] WHERE 1 = [Extent1].[Id]) AS [Project2] ORDER BY [Project2].[Id] ASC, [Project2].[C4] ASC, [Project2].[Id1] ASC, [Project2].[C3] ASC
The query plan for this seems overly complicated:
If you’ll look closely, you’ll see that it generate a join between Blogs, Posts and Comments, essentially creating a Cartesian product between all three.
I am not going to offer commentary on the results, but open a discussion on them.