I'm having performance issues with this query. If I remove the status
column, it runs very fast; but adding the subquery in the column section delays the query way too much (1.02 min execution). How can I modify this query so it runs faster while still getting the desired data?
The reason I put that subquery there is because I needed the status for the latest activity; some activities have null
status so I have to ignore them.
Establishments: 6.5k rows - EstablishmentActivities: 70k rows - Status: 2 (Active, Inactive)
SELECT DISTINCT
est.id,
est.trackingNumber,
est.NAME AS 'establishment',
actTypes.NAME AS 'activity',
(
SELECT stat3.NAME
FROM SACPAN_EstablishmentActivities eact3
INNER JOIN SACPAN_ActivityTypes at3
ON eact3.activityType_FK = at3.code
INNER JOIN SACPAN_Status stat3
ON stat3.id = at3.status_FK
WHERE eact3.establishment_FK = est.id
AND eact3.rowCreatedDT = (
SELECT MAX(est4.rowCreatedDT)
FROM SACPAN_EstablishmentActivities est4
INNER JOIN SACPAN_ActivityTypes at4
ON est4.establishment_fk = est.id
AND est4.activityType_FK = at4.code
WHERE est4.establishment_fk = est.id
AND at4.status_FK IS NOT NULL
)
AND at3.status_FK IS NOT NULL
) AS 'status',
est.authorizationNumber,
reg.NAME AS 'region',
mun.NAME AS 'municipality',
ISNULL(usr.NAME, '') + ISNULL(+ ' ' + usr.lastName, '')
AS 'created',
ISNULL(usr2.NAME, '') + ISNULL(+ ' ' + usr2.lastName, '')
AS 'updated',
est.rowCreatedDT,
est.rowUpdatedDT,
CASE WHEN est.rowUpdatedDT >= est.rowCreatedDT
THEN est.rowUpdatedDT
ELSE est.rowCreatedDT
END AS 'LatestCreatedOrModified'
FROM SACPAN_Establishments est
INNER JOIN SACPAN_EstablishmentActivities eact
ON est.id = eact.establishment_FK
INNER JOIN SACPAN_ActivityTypes actTypes
ON eact.activityType_FK = actTypes.code
INNER JOIN SACPAN_Regions reg
ON est.region_FK = reg.code --
INNER JOIN SACPAN_Municipalities mun
ON est.municipality_FK = mun.code
INNER JOIN SACPAN_ContactEstablishments ce
ON ce.establishment_FK = est.id
INNER JOIN SACPAN_Contacts con
ON ce.contact_FK = con.id
--JOIN SACPAN_Status stat ON stat.id = actTypes.status_FK
INNER JOIN SACPAN_Users usr
ON usr.id = est.rowCreatedBy_FK
LEFT JOIN SACPAN_Users usr2
ON usr2.id = est.rowUpdatedBy_FK
WHERE (con.ssn = @ssn OR @ssn = '*')
AND eact.rowCreatedDT = (
SELECT MAX(eact2.rowCreatedDT)
FROM SACPAN_EstablishmentActivities eact2
WHERE eact2.establishment_FK = est.id
)
--AND est.id = 6266
ORDER BY 'LatestCreatedOrModified' DESC
I can't upload the execution plan because of rep points, but there's a 21% on one of the establishment activity tables.
-
\$\begingroup\$ Please provide DDL query for the tables and execution plan \$\endgroup\$almaz– almaz2013年03月04日 10:52:15 +00:00Commented Mar 4, 2013 at 10:52
2 Answers 2
Nitpicking
I find your table aliases to be difficult to follow. Take this with a grain of salt coming from an outsider to your organization. I find eact
, eact3
, eact4
, etc. to be a bit confusing, for example. Granted, if there is an internal, consistent standard, disregard this part.
SELECT DISTINCT
This is a pretty expensive operation, and usually better to use GROUP BY instead. That way you run your uniqueness check on the result set instead of the source data set (which could be a LOT larger) and only on the columns you think/know may have duplicate values. For example:
GROUP BY est.id, est.trackingNumber, actTypes.NAME
Create a stored statement
Given the look of this whole script, this doesn't smell like an ad-hoc query to me. Assuming that's accurate, you would benefit from creating a stored statement, in this case likely a stored procedure. This commented line at the bottom kind of gave me a hint:
-- AND est.id = 6266
For example:
CREATE PROCEDURE FooBar
@Establishment_ID INT
AS
BEGIN
/* your query here */
AND est.id = @Establishment_ID -- your commented line
ORDER BY 'LatestCreatedOrModified' DESC
END;
GO
Then, whenever needed:
EXEC FooBar 6266;
This way, the execution plan will be stored after it is called for the first time, speeding up (sometimes dramatically) execution after it is ran once.
Compliments
I have to note that your formatting, capitalization and aliasing are consistent. Kudos for that.
The elephant in the room
To me, the slowness of this query could be largely due to multiple levels of nested subqueries, each with multiple join
operations. It appears that you join the same tables over and over, and this could be mitigated to some extent by having your nested where
clauses in your outer query, whenever possible.
This:
FROM SACPAN_EstablishmentActivities eact3 --(1)
INNER JOIN SACPAN_ActivityTypes at3 -- (2)
ON eact3.activityType_FK = at3.code
INNER JOIN SACPAN_Status stat3 -- (3)
That:
FROM SACPAN_EstablishmentActivities est4 -- (1)
INNER JOIN SACPAN_ActivityTypes at4 -- (2)
And the other:
INNER JOIN SACPAN_EstablishmentActivities eact -- (1)
ON est.id = eact.establishment_FK
INNER JOIN SACPAN_ActivityTypes actTypes -- (2)
ON eact.activityType_FK = actTypes.code
INNER JOIN SACPAN_Regions reg
ON est.region_FK = reg.code --
INNER JOIN SACPAN_Municipalities mun
ON est.municipality_FK = mun.code
INNER JOIN SACPAN_ContactEstablishments ce
ON ce.establishment_FK = est.id
INNER JOIN SACPAN_Contacts con
ON ce.contact_FK = con.id
--JOIN SACPAN_Status stat -- (3)
Notice I've commented out where you are joining all the same tables. You could more than likely work most these conditions into your where
clause and avoid unneeded overhead. It's impossible for me to test it out, since I don't have access to your DB, but try to eliminate subqueries/CTEs as much as you can help it.
Your code is calling several tables multiple times when it doesn't seem like it needs to. Can you pull out your Activities related tables into common table expressions and just link to those in your main query. Rough example:
;with created as
(
select establishment_fk, max(rowCreatedDT) as maxCreated
FROM SACPAN_EstablishmentActivities
JOIN SACPAN_ActivityTypes at4
ON activityType_FK = at4.code
WHERE at4.status_FK IS NOT NULL
),
status as
(
SELECT eact3.establishment_fk, stat3.NAME as status, at3.NAME as activity
FROM SACPAN_EstablishmentActivities eact3
INNER JOIN SACPAN_ActivityTypes at3
ON eact3.activityType_FK = at3.code
INNER JOIN SACPAN_Status stat3
ON stat3.id = at3.status_FK
where exists
(select 1 from created
where created.establishment_fk = eact3.establishment_fk
and created.maxCreated = eact3.rowCreatedDT)
)
SELECT DISTINCT
est.id,
est.trackingNumber,
est.NAME AS 'establishment',
actTypes.NAME AS 'activity',
AS 'status',
est.authorizationNumber,
reg.NAME AS 'region',
mun.NAME AS 'municipality',
ISNULL(usr.NAME, '') + ISNULL(+ ' ' + usr.lastName, '')
AS 'created',
ISNULL(usr2.NAME, '') + ISNULL(+ ' ' + usr2.lastName, '')
AS 'updated',
est.rowCreatedDT,
est.rowUpdatedDT,
CASE WHEN est.rowUpdatedDT >= est.rowCreatedDT
THEN est.rowUpdatedDT
ELSE est.rowCreatedDT
END AS 'LatestCreatedOrModified'
FROM SACPAN_Establishments est
INNER JOIN status on status.establishment_fk = est.id
INNER JOIN SACPAN_Regions reg
ON est.region_FK = reg.code --
INNER JOIN SACPAN_Municipalities mun
ON est.municipality_FK = mun.code
INNER JOIN SACPAN_ContactEstablishments ce
ON ce.establishment_FK = est.id
INNER JOIN SACPAN_Contacts con
ON ce.contact_FK = con.id
INNER JOIN SACPAN_Users usr
ON usr.id = est.rowCreatedBy_FK
LEFT JOIN SACPAN_Users usr2
ON usr2.id = est.rowUpdatedBy_FK
WHERE (con.ssn = @ssn OR @ssn = '*')
That may be a little rough, but the idea here is to pull out what your Max Created date is just once and then just get your status as one set. I'm fairly certain that a large part of your slowness on the status subquery is how it's having to run; if you can get it into one set that you can then connect to the main query, will hopefully help.
-
1\$\begingroup\$ I would like to say, while CTEs make the code much easier to read, they generally don't improve performance, because the CTE has to be ran every time it's called, just like a subquery. But I do like how your query looks a lot better. \$\endgroup\$Phrancis– Phrancis2014年08月13日 03:24:05 +00:00Commented Aug 13, 2014 at 3:24