Optimize a Sql subquery containing multiple inner joins and aggregate functions

Question 1

I have a select statement which is infact a subquery within a larger select statement built up programmatically. The problem is if I elect to include this subquery it acts as a bottle neck and the whole query becomes painfully slow.

An example of the data is as follows:

Payment
.Receipt_no|.Person |.Payment_date|.Type|.Reversed| 
 2|John |01/02/2001 |PA | |
 1|John |01/02/2001 |GX | |
 3|David |15/04/2003 |PA | |
 6|Mike |26/07/2002 |PA |R |
 5|John |01/01/2001 |PA | |
 4|Mike |13/05/2000 |GX | |
 8|Mike |27/11/2004 |PA | |
 7|David |05/12/2003 |PA |R |
 9|David |15/04/2003 |PA | |

The subquery is as follows :

select Payment.Person, 
Payment.amount 
from Payment
inner join (Select min([min_Receipt].Person) 'Person',
 min([min_Receipt].Receipt_no) 'Receipt_no' 
 from Payment [min_Receipt] 
 inner join (select min(Person) 'Person', 
 min(Payment_date) 'Payment_date' 
 from Payment
 where Payment.reversed != 'R' and Payment.Type != 'GX' 
 group by Payment.Person) [min_date] 
 on [min_date].Person= [min_Receipt].Person and [min_date].Payment_date = [min_Receipt].Payment_date 
 where [min_Receipt].reversed != 'R' and [min_Receipt].Type != 'GX' 
 group by [min_Receipt].Person) [1stPayment] 
on [1stPayment].Receipt_no = Payment.Receipt_no

This retrieves the first payment of each person by .Payment_date (ascending), .Receipt_no (ascending) where .type is not 'GX' and .Reversed is not 'R'. As Follows:

Payment
.Receipt_No|.Person|.Payment_date
 5|John |01/01/2001
 3|David |15/04/2003
 8|Mike |27/11/2004

~~(削除) I am unable to move the subquery out to a temporary table as temporary tables are simply not supported within the programming language used by my application. (削除ここまで)~~

Edit : Incorrect statement. Temporary tables are supported and therefore this is a valid option.

Following a post on StackOverflow -

The Query was rewritten as the following.

Query 1.

select min(Payment.Person) 'Person',
 min(Payment.receipt_no) 'receipt_no'
from
 Payment a
where
 a.type<>'GX' and (a.reversed not in ('R') or a.reversed is null)
 and a.payment_date = 
 (select min(payment_date) from Payment i 
 where i.Person=a.Person and i.type <> 'GX' 
 and (i.reversed not in ('R') or i.reversed is null))
group by a.Person

I added this as a subquery within my much larger query, however it still ran very slowly. So I tried rewriting the query whilst trying to avoid the use of aggregate functions and came up with the following.

Query 2.

SELECT
 receipt_no,
 person,
 payment_date,
 amount
FROM
 payment a
WHERE 
 receipt_no IN 
 (SELECT 
 top 1 i.receipt_no 
 FROM 
 payment i 
 WHERE 
 (i.reversed NOT IN ('R') OR i.reversed IS NULL) 
 AND i.type<>'GX' 
 AND i.person = a.person 
 ORDER BY i.payment_date DESC, i.receipt_no ASC)

Which I wouldn't necessarily think of as being more efficient. In fact if I run the two queries side by side on my larger data set Query 1. completes in a matter of milliseconds where as Query 2. takes several seconds.

However if I then add them as subqueries within a much larger query, the larger query completes in hours using Query 1. and completes in 40 seconds using Query 2.

I can only attribute this to the use of aggregate functions in one and not the other.

Question 2

What is the database type and version? Also, have you looked into RANK() or equivalent?

Question 3

The Database I'm working with is Visual dataflex 14.0 with a Sql server 2008 R2 back end. However any Sql commands I use would have to be backwardly compatible to atleast Sql server 2005. Preferably sql server 2000 if possible.

Question 4

I've never used RANK() before but I can definitely see myself using it again. Very useful thank you. I've added my rewritten Query using Rank() above.

Question 5

FYI, RANK() is not available in SQL Server 2k. :(

Question 6

Setting the Date_Correlation_Optimization option to true within database>properties>options may also improved the overall speed without the need to rewrite the sub query.

Question 7

I see that in your question you said:

"I am unable to move the subquery out to a temporary table as temporary tables are simply not supported within the programming language used by my application."

But, have you considered calling a stored procedure instead? Is this even an option, considering the limitations with the programming language?

If this is a viable option, you could simply have the results of your subquery inserted into a temp table transparently & encapsulate all the logic in the stored procedure.

Edit

I got to thinking about this some more, and perhaps the columns that you're using in your JOIN condition are of different collations. While this will usually result in a specific error message, there may be some implicit collation coversion occurring instead (see: MSDN: Collation Precedence (Transact-SQL)) between the sub-query & the data being joined.

Here are a few links about collation that might be useful to you:

Also, you may be able to trick your programming language into using a temp table with syntax like this:

SELECT *
 FROM tempdb..#MyTempTable

Just keep in mind that sometimes the temp database has a different collation then the data you're working with too, in which case you'll need to explicitly convert the data to/from each collation.

Question 8

Completely agree with Alexander ...also tryin to add few index to speed a bit more

Question 9

2 people completely agree, but this answer has neither up-votes nor edits to improve whatever could be better explained? I'm baffled.

Question 10

The statement I made with regards to my programming language not supporting temporary tables is incorrect and I have marked it as such. I'm not sure where I drew this conclusion from? Therefore using a temporary table is a valid option. From what I can tell there are no differences in collation between columns included in the joins. Though the information you posted on collation within Sql made for interesting reading most of which I was unaware of.

Alexander 1561 gold badge1 silver badge9 bronze badges · Answer 1 · 2012-11-08 18:05:29Z

I see that in your question you said:

"I am unable to move the subquery out to a temporary table as temporary tables are simply not supported within the programming language used by my application."

But, have you considered calling a stored procedure instead? Is this even an option, considering the limitations with the programming language?

If this is a viable option, you could simply have the results of your subquery inserted into a temp table transparently & encapsulate all the logic in the stored procedure.

Edit

I got to thinking about this some more, and perhaps the columns that you're using in your JOIN condition are of different collations. While this will usually result in a specific error message, there may be some implicit collation coversion occurring instead (see: MSDN: Collation Precedence (Transact-SQL)) between the sub-query & the data being joined.

Here are a few links about collation that might be useful to you:

Also, you may be able to trick your programming language into using a temp table with syntax like this:

SELECT *
 FROM tempdb..#MyTempTable

Just keep in mind that sometimes the temp database has a different collation then the data you're working with too, in which case you'll need to explicitly convert the data to/from each collation.

Completely agree with Alexander ...also tryin to add few index to speed a bit more
2 people completely agree, but this answer has neither up-votes nor edits to improve whatever could be better explained? I'm baffled.
The statement I made with regards to my programming language not supporting temporary tables is incorrect and I have marked it as such. I'm not sure where I drew this conclusion from? Therefore using a temporary table is a valid option. From what I can tell there are no differences in collation between columns included in the joins. Though the information you posted on collation within Sql made for interesting reading most of which I was unaware of.

Stack Exchange Network

Optimize a Sql subquery containing multiple inner joins and aggregate functions

Following a post on StackOverflow -

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Optimize a Sql subquery containing multiple inner joins and aggregate functions

Following a post on StackOverflow -

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions