In my scenario i need to optimize a stored procedure, that i used to import data into DB from an exchange DB. I am relly puzzled because, the solution with INNER JOIN is lower than the LEFT OUTER JOIN solution, basically seems that the way i check the "relation" exist cause a huge slowdown.
To be more clear, i have a Final_DB with few tabels, one for the artciles (tbl_ana_Articles) one for articles attributes aka characteristiscs (tbl_ana_Characteristics), and few other tables. In another Exchange_DB i get relation between articles and attribute/characteristics (it's used to update periodically relations in the Final_DB).
The table with relations, provided by exchange DB, need to be unpivoted first, before it will be usefull (it's not a cross join table when i get it, it become a cross join after the unpivote).
So basically i write the query in that way :
WITH ExchangeArticleCode_CharacteristicCode AS (
SELECT [ACODAR], SUBSTRING([TNCAR],5,2) AS [TNCAR] , [TVALOR]
FROM [EXCHANGE_DB].[dbo].[ANART00F]
UNPIVOT
(
[TVALOR]
FOR [TNCAR] in ([ACAR01], ACAR02, ACAR03, ACAR04 , ACAR05 , ACAR06 , ACAR07 , ACAR08 , ACAR09 , ACAR10 , ACAR11 , ACAR12 , ACAR13 , ACAR14 , ACAR15 , ACAR16 , ACAR17 , ACAR18 , ACAR19 , ACAR20)
) AS [UNPIVOT]
WHERE [TVALOR] IS NOT NULL AND [TVALOR] != ''
)
SELECT Characteristic.[ID] AS [ID_CHARACTERISTIC]
,Article.[ID] AS [ID_ARTICLE]
,Characteristic.[ID_FILTER] AS [ID_FILTER]
FROM ExchangeArticleCode_CharacteristicCode
LEFT OUTER JOIN [dbo].[tbl_ana_Articles] AS Article
ON (ExchangeArticleCode_CharacteristicCode.ACODAR collate Latin1_General_CI_AS) = Article.CODE
LEFT OUTER JOIN [dbo].[tbl_ana_Characteristics] AS Characteristic
ON (ExchangeArticleCode_CharacteristicCode.TNCAR collate Latin1_General_CI_AS) + '_' + (ExchangeArticleCode_CharacteristicCode.TVALOR collate Latin1_General_CI_AS) = Characteristic.ID_ERP
WHERE Characteristic.[IS_ACTIVE] = 1
This solution is surprising fast, but have issues, sometimes there is junk into exchange DB, so the left join fail to match the codes and in the result table i get some NULL. If i try to prevent the NULL, replacing LEFT OUTER JOIN with INNER JOIN or adding a check (IS NOT NULL) into the where condition, the query become really slow and heavy to execute. It's not clear to me why and how to avoid that.
Here the execution plan of the fast query : https://www.brentozar.com/pastetheplan/?id=ryMBHCryp
Here the execution plan of the slow query : https://www.brentozar.com/pastetheplan/?id=SywALRS16
To be fair, it seems to me that the slow query have an expensive nested loop, but why the INNER JOIN is transated in such nested loop?
2 Answers 2
issues
There are a few issues with this code that could be easily solved with a temp table. In particular, constructing multi-column and value join keys at runtime will often lead to strange plan choices and poor cardinality estimates.
I also made a slight tweak to your where
clause to look for rows that have at least a single character in them. Checking for both not null and non-empty strings isn't necessary, since nulls can't be matched to values.
You may also find a clustered index on the temp table useful, either on ACODAR, TNCAR_TVALOR
or the reverse, TNCAR_TVALOR, ACODAR
.
WITH
ExchangeArticleCode_CharacteristicCode AS
(
SELECT
ACODAR =
ACODAR COLLATE Latin1_General_CI_AS,
TNCAR =
SUBSTRING(TNCAR, 5, 2),
TVALOR,
TNCAR_TVALOR =
SUBSTRING(TNCAR, 5, 2) +
'_' +
TVALOR COLLATE Latin1_General_CI_AS
FROM EXCHANGE_DB.dbo.ANART00F
UNPIVOT
(
TVALOR
FOR TNCAR IN
(
ACAR01,
ACAR02,
ACAR03,
ACAR04,
ACAR05,
ACAR06,
ACAR07,
ACAR08,
ACAR09,
ACAR10,
ACAR11,
ACAR12,
ACAR13,
ACAR14,
ACAR15,
ACAR16,
ACAR17,
ACAR18,
ACAR19,
ACAR20
)
) AS [UNPIVOT]
WHERE TVALOR LIKE '_%'
)
SELECT
*
INTO #ExchangeArticleCode_CharacteristicCode
FROM ExchangeArticleCode_CharacteristicCode;
SELECT
ID_CHARACTERISTIC =
Characteristic.ID,
ID_ARTICLE =
Article.ID,
ID_FILTER =
Characteristic.ID_FILTER
FROM #ExchangeArticleCode_CharacteristicCode AS ExchangeArticleCode_CharacteristicCode
LEFT OUTER JOIN dbo.tbl_ana_Articles AS Article
ON ExchangeArticleCode_CharacteristicCode.ACODAR = Article.CODE
LEFT OUTER JOIN dbo.tbl_ana_Characteristics AS Characteristic
ON ExchangeArticleCode_CharacteristicCode.TNCAR_TVALOR = Characteristic.ID_ERP
WHERE Characteristic.[IS_ACTIVE] = 1;
-
Thanks a lot for the hint about use of temp table. I will keep in mind in future, can be very usefullSkary– Skary2023年09月19日 10:37:58 +00:00Commented Sep 19, 2023 at 10:37
The condition WHERE [TVALOR] IS NOT NULL AND [TVALOR] != ''
in the unpivoted table is much more selective than SQLServer estimates.
In the fast plan, it estimates that it will select 216056 rows out of 423640, but it actually produces only 11944 rows.
With the LEFT JOINs, Sqlserver starts from ExchangeArticleCode_CharacteristicCode, which is the left table in the LEFT JOIN, producing 11944 rows, then matches them to the other big tables, but that doesn't increases the number of rows, because apparently those rows don't match more than one article or characteristic.
When you switch to INNER JOIN
(or add a NOT NULL check, which tells the optimizer it can treat the LEFT JOIN as an INNER JOIN) Sqlserver can reorder the joins to what it thinks it will be most efficient.
It assumes that unpivoting the ANART00F table will multiplicate the rows, so it postpones it to the very end.
It correctly assumes that the join between ANART00F and the Articles table will be efficient (it produces 21182 rows), but then it joins them with the Characteristic table. Since the join condition requires the unpivoted columns, this is actually a full join without conditions and produces 115 millions rows The result is then upivoted to become 2 billion rows. Only then, sqlserver appplies the where conditions, reducing those 2 billions down to 11934 (while sqlserver expected them to reduce the result set to 165 millions).
To correct this, you can try to update statistics for all involved tables, but I suspect that the specific values of the records you are trying to select and the unpivoting cannot be guessed properly even by updated statistics.
Another solution you can try is to use SET FORCEPLAN
to force the query optimizer to join the tables in the order you specified
SET FORCEPLAN ON;
-- your query here
SET FORCEPLAN OFF;
Explore related questions
See similar questions with these tags.