This query took 6 seconds to complete. How can I optimize it? Total records in table is 166803.
SELECT ltrim(rtrim(CAST(cageID as nvarchar(max))))+ltrim(rtrim(CAST(trayNo as nvarchar(max)))) as _unique,*
from lf_transit_cage
where ltrim(rtrim(CAST(cageID as nvarchar(max))))+ltrim(rtrim(CAST(trayNo as nvarchar(max)))) in
(
SELECT dt._unique FROM
(
SELECT ltrim(rtrim(CAST(cageID as nvarchar(max))))+ltrim(rtrim(CAST(trayNo as nvarchar(max)))) as _unique
from lf_transit_cage
) as dt
group by dt._unique
HAVING COUNT(dt._unique)>1
)
order by cageID,trayNo
1 Answer 1
As mentioned in the comments, there are benefits to casting/storing that unique key in the table during the ETL process, especially if it's going to be used in other places than just this query.
Most likely, the performance hit is coming from using IN (typically results in a row by row lookup) and from de-duping with the casted key. You could get a performance gain from JOINing the subequery instead of using IN. You could also use ROW_NUMBER which, in my experience, is typically more performant than the GROUP BY with HAVING clause.
Here's my example using ROW_NUMBER and CTE's for easier reading:
--Calculate Unique NVARCHAR key
;WITH cte_lf_transit_cage AS (
SELECT
ltrim(rtrim(CAST(cageID as nvarchar(max))))+ltrim(rtrim(CAST(trayNo as nvarchar(max)))) as _unique,
*
FROM
lf_transit_cage
)
--Get the Row Count
, cte_rowcount AS (
SELECT
_unique,
ROW_NUMBER() OVER (PARTITION BY _unique ORDER BY cageID, trayNo) AS rowcnt
FROM
cte_lf_transit_cage
)
--Grab all instances of duplicate rows
SELECT
ltc.*
FROM
cte_lf_transit_cage ltc
WHERE
EXISTS
(SELECT unique FROM cte_rowcount rc WHERE rc._unique = ltc._unique AND rc.rowcnt > 1 )
ORDER BY
ltc.cageID,
ltc.trayNo
Also, was mentioned in the comments that you may not need to generate the _unique key depending on how the data is stored. Might compare results to confirm:
--Get the Row Count
;WITH cte_rowcount AS (
SELECT
cageID,
trayNo,
ROW_NUMBER() OVER (PARTITION BY cageID, trayNo ORDER BY trayNo) AS rowcnt
FROM
lf_transit_cage
)
--Grab all instances of duplicate rows
SELECT
ltrim(rtrim(CAST(ltc.cageID as nvarchar(max))))+ltrim(rtrim(CAST(ltc.trayNo as nvarchar(max)))) as _unique,
ltc.*
FROM
lf_transit_cage ltc
WHERE
EXISTS
(SELECT * FROM cte_rowcount rc WHERE rc.cageID = ltc.cageID AND rc.trayNo = ltc.trayNo AND rc.rowcnt > 1 )
ORDER BY
ltc.cageID,
ltc.trayNo
-
\$\begingroup\$ codes only work if I put
cageId
andtrayNo
in the lastGROUP BY
. \$\endgroup\$Pop– Pop2016年02月15日 05:56:19 +00:00Commented Feb 15, 2016 at 5:56 -
\$\begingroup\$ result are different from my codes result too. \$\endgroup\$Pop– Pop2016年02月15日 06:25:17 +00:00Commented Feb 15, 2016 at 6:25
-
\$\begingroup\$ Yeah, Grouping by CageId & TrayNo would result in duplicates, so I updated to just order by that unique key. Does that match your results now? \$\endgroup\$vanlee1987– vanlee19872016年02月15日 16:27:13 +00:00Commented Feb 15, 2016 at 16:27
-
\$\begingroup\$ My result show all the duplicate values but your result show only one of each duplicate values even with
GROUP BY _unique, cageId, trayNo
. \$\endgroup\$Pop– Pop2016年02月16日 01:11:25 +00:00Commented Feb 16, 2016 at 1:11 -
\$\begingroup\$ Ah, right you are, based on your example, we need to return all results. Update using EXISTS, let me know if that helps. \$\endgroup\$vanlee1987– vanlee19872016年02月16日 08:11:05 +00:00Commented Feb 16, 2016 at 8:11
cageID, trayNo
in a very misguided way? \$\endgroup\$