In the home-page of my multi-tenant web application, Linq-to-Entities (EF6) is used to get a paged list of Calendar Events. This data is JOIN
ed with some other tables to produce this query, below.
The tables all have composite primary keys that combine their TenantId
with their own IDENTITY
. I put the IDENTITY
column first in clustered index because STATISTICS
only looks at the first column in an index, and I understand having the most-specific column first improves performance of get-single-row queries. (If I'm wrong about this then please let me know!).
The schema is this:
CREATE TABLE dbo.Events (
EventId int NOT NULL IDENTITY(1,1),
TenantId int NOT NULL,
CustomerId int NULL,
EventTypeId int NOT NULL,
LocationId int NOT NULL,
Etc...
CONSTRAINT PRIMARY KEY ( EventId, TenantId ),
CONSTRAINT FOREIGN KEY FK_Events_Tenants ( TenantId ) REFERENCES dbo.Tenants ( TenantId ),
CONSTRAINT FOREIGN KEY FK_Events_Customers ( CustomerId, TenantId ) REFERENCES dbo.Customers ( CustomerId, TenantId ),
CONSTRAINT FOREIGN KEY FK_Events_EventTypes ( EventTypeId, TenantId ) REFERENCES dbo.EventTypes ( EventTypeId, TenantId ),
CONSTRAINT FOREIGN KEY FK_Events_Locations ( LocationId, TenantId ) REFERENCES dbo.Locations ( LocationId, TenantId ),
)
The query that Linq generates is this:
(reformatted for readability, e.g. indentation, renaming Linq's Extent1
alias to shorter aliases based on the table's name, removing redundant column aliases, and removing Linq's idiosyncratic Project1
, these changes did not affect the execution plan)
SELECT
e.(etc),
c.(etc),
l.(etc),
t.(etc)
FROM
dbo.Events AS e
LEFT OUTER JOIN dbo.Customers AS c ON
c.TenantId = e.TenantId
AND
c.CustomerId = e.CustomerId
INNER JOIN dbo.Locations AS l ON
l.TenantId = e.TenantId
AND
l.LocationId = e.LocationId
INNER JOIN dbo.EventTypes AS t ON
t.TenantId = e.TenantId
AND
t.TypeId = e.TypeId
WHERE
e.TenantId = @tenantId
ORDER BY
ROW_NUMBER() OVER ( ORDER BY e.EventId DESC, e.TenantId DESC )
OFFSET
@offset ROWS FETCH NEXT @pageSize ROWS ONLY
- The
@tenantId
parameter is123
. - The
@offset
parameter is0
. - The
@pageSize
parameter is50
. - I ran the query before and after a rebuild of all indexes, as well as updating all
STATISTICS
withsp_updatestats @resample = 'resample'
. This had no effect on the execution plan. - I also ran them with
OPTION(RECOMPILE)
. This also had no effect on the execution plan. - This query is being run on Azure SQL, btw.
Recently (ever since a few weeks ago) this query was running very slowly, so I got the Execution Plan for that query from SSMS:
- I have circled all execution plan nodes that pass 41,109 rows around.
- 41,109 is the number of rows in my
dbo.Events
table that correspond to theTenantId
account I was looking at.- So every time I saw 41,109 in the plan I know that it was reading all
dbo.Events
rows - or index nodes - for that Tenant. - But the query has paging via
OFFSET ROWS FETCH ROWS
with anORDER BY
that uses the Clustered Index key columns - so it should not be reading more than 50 rows from thedbo.Events
table!
- So every time I saw 41,109 in the plan I know that it was reading all
- I've uploaded the Execution Plan to Brent Ozar's PasteThePlan website: https://www.brentozar.com/pastetheplan/?id=Hyc1bdkEO
I saw that the plan was reading from the dbo.EventTypes
table first, and using the output of that read to filter rows from dbo.Events
- but that's not my intention. My intention is for dbo.Events
to be the "primary" table, and for it to get rows from the other tables (dbo.Customers
, dbo.EventTypes
, and dbo.Locations
) based on the rows it read from dbo.Events
.
When I removed the INNER JOIN dbo.EventTypes AS t ON...
part of the query (and the concordant columns) and re-ran the query, the execution plan was now what I intended it to be and the query ran very fast (except without the dbo.EventTypes
data that I still need...):
SELECT
e.(etc),
c.(etc),
l.(etc)
FROM
dbo.Events AS e
LEFT OUTER JOIN dbo.Customers AS c ON
c.TenantId = e.TenantId
AND
c.CustomerId = e.CustomerId
INNER JOIN dbo.Locations AS l ON
l.TenantId = e.TenantId
AND
l.LocationId = e.LocationId
WHERE
e.TenantId = @tenantId
ORDER BY
ROW_NUMBER() OVER ( ORDER BY e.EventId DESC, e.TenantId DESC )
OFFSET
@offset ROWS FETCH NEXT @pageSize ROWS ONLY
The execution plan now shows it reading exactly 50 rows from the dbo.Events
table. (Curiously, it also shows it reading from the dbo.Locations
table and indexes twice - I don't understand why).
The Execution Plan window does not indicate that any indexes are missing for either query. When dbo.EventTypes
is used in the query then the execution-plan shows the Index Seek node on dbo.Events
is using the TenantId
as the only seek predicate and ignoring the EventId
predicate that would be used because of the ORDER BY
clause.
Unfortunately none of the online resources I've found that advise improving execution plans by judicious use of indexes helped me in this cause because, as far as I know, the dbo.Events
table has plenty of coverage by indexes (there's more than 15 indexes on the table) and none of them talked about how indexes are used by ORDER BY
, TOP
, and OFFSET
. Additionally they all advised that incorrect estimated row counts (like how I see 2616 estimated but 41109 actual, or 77192 estimated but 41109 actual) would be rectified by updating STATISTICS
, but I did update all statistics and that didn't help at all.
-
Do you possibly have the plan XML for the "fast" plan (the one that only passes 50 rows around). I wonder what makes SQL think it can safely select 50 Events rows of estimated 3213 in the first place.Kuba Wyrostek– Kuba Wyrostek2022年10月27日 12:15:09 +00:00Commented Oct 27, 2022 at 12:15
4 Answers 4
Multi-tenant systems are very prone to having problems with skewed statistics. Both the Events
and Locations
table have over a million rows in them, and are using the automatic sampling rate chosen by the database engine. You can see that in the execution plan XML, here are a couple of examples:
<StatisticsInfo
Database="[D0849AEA-A31D-4639-9F71-2651302E4887]" Schema="[dbo]" Table="[Locations]"
Statistics="[nci_wi_Locations_C8E94572A7BDF437552DACBCD45FE7BF]"
ModificationCount="981" SamplingPercent="5.79953" LastUpdate="2021年03月14日T01:55:32.9" />
<StatisticsInfo
Database="[D0849AEA-A31D-4639-9F71-2651302E4887]" Schema="[dbo]" Table="[Events]"
Statistics="[nci_wi_Events_9AD45C8598F09E7FDFDFA2AD8792424C]"
ModificationCount="2113" SamplingPercent="4.56518" LastUpdate="2021年03月14日T01:55:42.45" />
Notice that these statistics are based on sampling only ~4-5% of the rows in the table.
This is normally okay, but could be impacting your situation significantly. I would update statistics for the indexes on both tables using the FULLSCAN
option:
UPDATE STATISTICS dbo.Locations
WITH FULLSCAN;
GO
UPDATE STATISTICS dbo.Events
WITH FULLSCAN;
GO
Then try re-running the query. Hopefully the optimizer will have a better chance of estimating the join cardinality between these two tables then, and choosing the more efficient plan.
If that works, you can back down the sample rate to a lower value that still produces the good plan, and potentially "pin it" in place with the PERSIST_SAMPLE_PERCENT
.
As a potential workaround: since the query works well when the join to EventTypes
isn't present, you could remove that from your LINQ query (don't .Include(...)
that entity, don't reference any of its properties in a Select(...)
projection).
Then you could retrieve the 25 EventTypes
rows needed separately, either via lazy loading (which would result in 25 additional queries, but they'd be point lookups) or by running a separate query to get them all by ID.
Getting them all by ID would be something like this:
// "events" here is the result of your first query
var eventTypeIds = events
.Select(e => e.EventType.Id)
.ToList();
var eventTypes = _context.EventTypes
.Where(et => eventTypeIds.Contains(et.Id))
.ToList();
-
Updating statistics with fullscan did not result in an optimal execution plan under all circumstances. My application code still loads EventTypes separately.Dai– Dai2022年10月29日 05:49:54 +00:00Commented Oct 29, 2022 at 5:49
Try building it with the clustered index keys in the reverse order - so tenantid first. for the inner joins add the
l.TenantId = @tenantId
to the join criteria
change your order by from
ORDER BY
ROW_NUMBER() OVER ( ORDER BY e.EventId DESC, e.TenantId DESC )
to
ORDER BY e.EventId DESC, e.TenantId DESC
although with the reversed order you would be better served with
ORDER BY e.TenantId DESC, e.EventId DESC
if that is still acceptable If you still get a sort consider building the PK with DESC order as well
-
Unfortunately the query is generated by Entity Framework / Linq which doesn't allow for the changes you're asking for - and redefining the Clustered Index for the
dbo.Events
table isn't feasible right now (and the fact the query works fine provided I don't do anINNER JOIN dbo.EventTypes
suggests it's unnecessary.Dai– Dai2021年03月17日 09:59:00 +00:00Commented Mar 17, 2021 at 9:59 -
Changing the
ORDER BY
to use the columns directly without theROW_NUMBER()
windowing function has no effect on the execution plan (with and withoutRECOMPILE
).Dai– Dai2021年03月17日 09:59:57 +00:00Commented Mar 17, 2021 at 9:59 -
Can you use the Force Order hint ?Stephen Morris - Mo64– Stephen Morris - Mo642021年03月17日 10:03:59 +00:00Commented Mar 17, 2021 at 10:03
-
No, unfortunately. With very limited exceptions I cannot make any changes to the SQL like that due to how Linq works. I can add a query-interceptor to make some changes but that's a last-resort option. Using a force-order hint shouldn't be necessary though.Dai– Dai2021年03月17日 10:15:23 +00:00Commented Mar 17, 2021 at 10:15
Unfortunately LINQ is notorious for generating sub-par (and even terrible) queries in regards to performance for even some of the simplest code (such as a join between two entities, or a contains search on a single entity, etc). This is evident even in your query by the fact it added a meaningless ROW_NUMBER()
function call in your ORDER BY
clause, when it can just ORDER BY
the fields it's calling out inside the ROW_NUMBER()
function directly themselves.
That being said, I don't necessarily know if LINQ is your actual issue here yet though. If you take the first generated query (inclusive of your join to EventTypes
), and run it directly in SSMS with hard-coded values instead of parameters, how is the performance? If it's good then you could be experiencing a parameter sniffing issue with the parameterized query sent to the server from LINQ. This in turn could be causing a cardinality estimate issue, which may be what your first execution plan screenshot is even showing, but it's hard to confirm without the actual execution plan in front of us. Could you please upload it to Paste The Plan? There's a lot of information in the actual execution plan that's not shown in a screenshot.
Also sidenote, clauses like TOP
and OFFSET
generally don't reduce your dataset early on, rather they're applied later on in the order of operations. This is because predicate filtering (JOIN
, WHERE
, and HAVING
clauses) are typically used to filter the data down first since they're specific to how to filter the data (as opposed to TOP
and OFFSET
being generic) and call out fields that potentially are indexed, so this influences the order of operations in a query. The SQL Engine is able to make use of indexes early on in the set of operations it performs (because indexes store the data in logical data structure of a B-Tree which is very efficient for searching), after that then it can then apply generic reduction to the dataset vis TOP
, OFFSET
, etc. So it's generally most optimal to reduce your data via predicates, or at least ensure your predicates are optimally indexed and used for efficient index usage by the SQL Engine when it goes to pull the data.
-
Regarding the use of
ROW_NUMBER()
- I did some research on that last week and I found out that that the Entity Framework dev-team is confident that usingROW_NUMBER()
for stable-sorts is reliable and performant so they made it the default in an update to EF6 in 2019 (see github.com/dotnet/ef6/commit/… ) and are even considering removing theUseRowNumberOrderingInOffsetQueries
option entirely (so it will always useROW_NUMBER()
). It made no difference to my own query.Dai– Dai2021年03月17日 11:16:58 +00:00Commented Mar 17, 2021 at 11:16 -
@Dai I didn't say
ROW_NUMBER()
was your issue, just pointing out an example of how LINQ can make decisions that don't make sense. That's fine if they want to default to it, but there's literally no reason in doing it vs using the fields inside of it. I bet unfortunately the EF Team doesn't have the experience to realize that the sort is exactly the same in both cases, and goes to show they shouldn't code hard and fast rules. Please see the in my answer about testing for parameter sniffing issues and uploading your execution plan so we can check for cardinality estimate issues.J.D.– J.D.2021年03月17日 11:23:09 +00:00Commented Mar 17, 2021 at 11:23 -
@Dai Note I read through the Git comments they made in your link, and it's exactly the case I thought. Essentially when you
ORDER BY
fields in SQL (in any context), if the combination of fields aren't unique then you end up with a semi-random ordering that occurs between executions of that query, that are nondeterministic. For example if you have a table ofCars
with the fieldsManufacturer
andModel
(so the data in the table could beNissan
Altima
, andNissan
Maxima
) if youORDER BY
just theManufacturer
field, there's an equal probability the results will order the...J.D.– J.D.2021年03月17日 11:32:39 +00:00Commented Mar 17, 2021 at 11:32 -
rows with the
Nissan Maxima
first then theNissan Altima
or theNissan Altima
first then theNissan Maxima
, it's semi-random. The same is true even when (and especially when) usingROW_NUMBER()
to generate unique IDsOVER
the ordering of non-unique sets of fields, such as in my example with theManufacturer
field. In one execution theROE_NUMBER()
generated for theNissan Maxima
will be 1 and 2 will be generated for theNissan Altima
and in another execution the numbers will be reversed and theNissan Altima
will be assigned 1 while theNissan Maxima
will be assigned...J.D.– J.D.2021年03月17日 11:35:02 +00:00Commented Mar 17, 2021 at 11:35 -
1Interesting - I was aware of the unstable-sort comment, but I didn't really understand why it would be a problem. I suggest you file a bug report against them as the same "feature" is in EF Core too...Dai– Dai2021年03月17日 12:05:55 +00:00Commented Mar 17, 2021 at 12:05
Make sure you have these indexes or equivalents:
CREATE INDEX IX1 ON EventTypes ( TenantId, EventypeId)
CREATE INDEX IX2 ON Locations (Tenantid, LocationId)
You say you have clustered indexes, where the columns are the other way around. Yet from the execution plan it seems SQL Server favors indexes where TenantId is the first key column.
I would also try changing all the inner joins to left joins. At least as experiment. It doesn't seem any of the tables affect the number of rows returned except Events.
From my perspective SQL Server is very confused here. The join between Events and Locations produces 41k rows where SQL Server expects 41. Three orders of magnitude diference. And this error propagates thorough the whole plan.