T-SQL INNER JOIN Optimisation

Question 1

Well, I really have read so many articles, used different techniques and done some random things to do this.

My problem was that a have big tables (over one million rows) and some small (with few hundreds rows) - we are talking about 8 inner joins involved.

What reduces the execution time from over 2 minutes to 30 seconds was very strange for me and I was not able to find why this happens.

When I select columns from the tables I cast them. What I did was to cast the column to the most small possible type.

For example:

nvarchar(4000) to nvarchar(50 or 25)
bigint to int
int to tinyint

The result was over 1 minute and 30 second better execution time. Why this happen?

For example if my variable is string with length 10 the nvarchar(4000) and the nvarchar(50) will cast it to nvarchar(10) (or something close to that). So, why when I reduce the outcome type from cast the things go better?

Something more - I make a lot of test to check which function is better - cast or convert (do test for string to string, string to number, number to string) but was not able to define which works better. Sometime the convert gives me few seconds better execution time, but not enough to make a conclusion. Has anyone do something like this and succeeded in?

Thank for your time in advance.

Question 2

Do you have explicit foreign keys between tables? I assume the CASTs are on these columns...

Question 3

@Martin Smith You are right. I have compared the execution plans and they were the same. Maybe the difference that I get (a few seconds) is results of the traffic to the server. Anyway. I have read a lot of new things of how to optimize my view. The solution is one - make it a index view. I was amazed by the big difference in performance that I get with the test. Perfect. But, unfortunately, in my real case (that I have written about in this thread) I am not able to create a indexed view because of the limitations...So, waiting for this year SQLServer and hope it will be more easy to do this.

Question 4

The reason the performance is better is that the smaller data types have much less working set - see the execution plan (http://www.red-gate.com/our-company/about/book-store/assets/sql-server-execution-plans.pdf)

I expect you get the largest benefit from the nvarchar(4000) to nvarchar(50) - that's a reduction of 80x - and nvarchar(4000) can use up to 8K! of space. For a key, that is not a good idea.

In addition, the fact that there are no foreign keys probably mean you don't have a very good indexing strategy either. If you did have indexes (even for ridiculously large columns), you would probably find they could outperform the cast since it probably wouldn't spool as much.

In general, you don't want any operations on your keys in the join if at all possible, especially for large data sets.

Question 5

To guess the motivation for the time difference you should compare the two query plans (with and without the casts, or with casts and converts) and the relative costs; you'll be able to identify in which phase there is a cost change.

Question 6

if you need to join a nvarchar(4000) to a nvarchar(10) and million of rows are involved, Id use a presisted computed column where you do a LEFT(long_column,10). Depending on the query you can even index that computed column. I'll bet your join will preform much better.

Question 7

true, but its good to remember that a persisted computed column will basically make a copy of the original column on the HD.

Question 8

@Diego, other than some sort of redesign, not much else you can do

Question 9

You can take care of below point as well for Inner Joins

Please check the difference between these two queries

SELECT T1.ColumnName1,
 T2.ColumnName2
FROM (SELECT ColumnName1
 FROM T1
 WHERE ID = 10)T1
 INNER JOIN T2
 ON T2.ID = T1.ID
SELECT T1.ColumnName1,
 T2.ColumnName2
FROM T1
 INNER JOIN T2
 ON T2.ID = T1.ID
WHERE ID = 10

You can add Non Clustered Index as well to get rid of scanning the complete table

But this is normally handy when the query returns one record as being used in my Sub Query example:)

Question 10

I have read about this when I was trying to find a way to optimize my joins. It was said that there is no difference, other people told that when you move the where after the join it will definitely help, but personally, in my case - i move all where clauses after the last join and the result was the same.

Question 11

This rewrite makes no difference other than adding a layer of obfuscation. The QO will easily transform one to the other. The point about NCIs seems somewhat generic - how are you relating this to the question?

Cade Roux Cade Roux 6,6841 gold badge33 silver badges55 bronze badges · Accepted Answer · 2012-01-26 13:59:29Z

The reason the performance is better is that the smaller data types have much less working set - see the execution plan (http://www.red-gate.com/our-company/about/book-store/assets/sql-server-execution-plans.pdf)

I expect you get the largest benefit from the nvarchar(4000) to nvarchar(50) - that's a reduction of 80x - and nvarchar(4000) can use up to 8K! of space. For a key, that is not a good idea.

In addition, the fact that there are no foreign keys probably mean you don't have a very good indexing strategy either. If you did have indexes (even for ridiculously large columns), you would probably find they could outperform the cast since it probably wouldn't spool as much.

In general, you don't want any operations on your keys in the join if at all possible, especially for large data sets.

Stack Exchange Network

T-SQL INNER JOIN Optimisation

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

T-SQL INNER JOIN Optimisation

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions