Correlated Subquery in SQL Server 2014

Question 1

I am running a correlated subquery to find out the listing of vendors (by Vendor Name) that are in different cities, states, i.e. we want to know the vendors that do not have a common city and state with other vendors. It seemed like a self-join was the thing to do.

Only hints if possible, please.

The Vendors table is:

Vendors(VendorID P, VendorCity, VendorState, VendorName,...)

This is what I have:

Select VendorName, VendorCity, VendorState from Vendors AS V1 where
 VendorCity, VendorState NOT IN (Select VendorCity, VendorState FROM
 Vendors AS V2 where V2.VendorID <> V1.VendorID)

This is the error message I get:

Msg 4145, Level 15, State 1, Line 2 An expression of non-boolean type specified in a context where a condition is expected, near ','.

I don't see why there is a reference to Boolean types, since this is not an EXISTS or other related query.

Question 2

The syntax for IN(...) is:

test_expression [ NOT ] IN 
 ( subquery | expression [ ,...n ] )

With:

test_expression

Is any valid expression.

subquery

Is a subquery that has a result set of one column. This column must have the same data type as test_expression.

expression[ ,... n ]

Is a list of expressions to test for a match. All expressions must be of the same type as test_expression.

This means that only 1 column is allowed on both side. Even a list of expressions is considered as 1 dummy table with 1 column for expression values similar to:

SELECT exp FROM (values(exp1), (exp2), ...) as X(exp)

With 2 columns you can use a subquery with EXISTS:

Select VendorName, VendorCity, VendorState 
FROM Vendors AS V1 
WHERE NOT EXISTS (
 SELECT 1 
 FROM Vendors AS V2 
 WHERE V2.VendorID <> V1.VendorID
 AND VendorCity = VendorCity
 AND VendorState = VendorState
)

Overall this subquery always return 0 row or 1 or more rows with 1 because it only needs to know if there is 1 or more rows with the same City and State. The value is not important because it only looks at the number of row (=> if 1 or more rows EXISTS), hence the SELECT 1. The test has already been done in the inner WHERE clause.

I guess that VendorId is the PK. An index on VendorCity and VendorState would help.

Since you are learning and if you don't already know it, you can also look at the usage of APPLY (CROSS APPLY and OUTER APPLY). I let you try it and play with it.

Question 3

As stated in the other answer, SQL Server does not currently allow your desired syntax. Other products do and there is a Connect item request here:

Add support for ANSI standard row value constructors.

It is possible to still use NOT IN though

SELECT VendorName,
 VendorCity,
 VendorState
FROM Vendors AS V1
WHERE 1 NOT IN (SELECT 1
 FROM Vendors AS V2
 WHERE V2.VendorCity = V1.VendorCity
 AND V2.VendorState = V1.VendorState
 AND V2.VendorID <> V1.VendorID);

This gives a similar plan, with an anti semi join operator, as NOT EXISTS

Question 4

We could also (ab)use GROUP BY with a HAVING COUNT = 1

Question 5

Use CONCAT(...):

Select VendorName, VendorCity, VendorState 
from Vendors AS V1 
where CONCAT( V1.VendorCity,V1.VendorState ) NOT IN (
 Select CONCAT( V2.VendorCity,V2.VendorState ) 
 FROM Vendors AS V2
 where V2.VendorID <> V1.VendorID
)

To address a comment left by Martin Smith:

This has at least a theoretical problem in that (ABC,DEF) will match (AB,CDEF) - I'm not sure if any real world city / state pairs exist that could cause this issue to come up though. Also it can be less efficient dependent on indexes available.

The ABC | DEF anagrams wouldn't be a problem in a real scenario since I don't think there are such combinations like ABCD | EF, for example. The problem with the indexes could be easily solved by creating an index containing only "VendorCity, VendorState" columns.

Alternatively, he could create his own hashtable for these 2 columns and store the generated hash as an extra field in the same table.

I would love to see a benchmark comparing the performance of the given solutions (including mine).

score 6 · Accepted Answer · 2016-03-05 07:16:39Z

The syntax for IN(...) is:

test_expression [ NOT ] IN 
 ( subquery | expression [ ,...n ] )

With:

test_expression

Is any valid expression.

subquery

Is a subquery that has a result set of one column. This column must have the same data type as test_expression.

expression[ ,... n ]

Is a list of expressions to test for a match. All expressions must be of the same type as test_expression.

This means that only 1 column is allowed on both side. Even a list of expressions is considered as 1 dummy table with 1 column for expression values similar to:

SELECT exp FROM (values(exp1), (exp2), ...) as X(exp)

With 2 columns you can use a subquery with EXISTS:

Select VendorName, VendorCity, VendorState 
FROM Vendors AS V1 
WHERE NOT EXISTS (
 SELECT 1 
 FROM Vendors AS V2 
 WHERE V2.VendorID <> V1.VendorID
 AND VendorCity = VendorCity
 AND VendorState = VendorState
)

Overall this subquery always return 0 row or 1 or more rows with 1 because it only needs to know if there is 1 or more rows with the same City and State. The value is not important because it only looks at the number of row (=> if 1 or more rows EXISTS), hence the SELECT 1. The test has already been done in the inner WHERE clause.

I guess that VendorId is the PK. An index on VendorCity and VendorState would help.

Since you are learning and if you don't already know it, you can also look at the usage of APPLY (CROSS APPLY and OUTER APPLY). I let you try it and play with it.

Stack Exchange Network

Correlated Subquery in SQL Server 2014

3 Answers 3

test_expression

subquery

expression[ ,... n ]

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Correlated Subquery in SQL Server 2014

3 Answers 3

test_expression

subquery

expression[ ,... n ]

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions