I am running a correlated subquery to find out the listing of vendors (by Vendor Name) that are in different cities, states, i.e. we want to know the vendors that do not have a common city and state with other vendors. It seemed like a self-join was the thing to do.
Only hints if possible, please.
The Vendors table is:
Vendors(VendorID P, VendorCity, VendorState, VendorName,...)
This is what I have:
Select VendorName, VendorCity, VendorState from Vendors AS V1 where
VendorCity, VendorState NOT IN (Select VendorCity, VendorState FROM
Vendors AS V2 where V2.VendorID <> V1.VendorID)
This is the error message I get:
Msg 4145, Level 15, State 1, Line 2 An expression of non-boolean type specified in a context where a condition is expected, near ','.
I don't see why there is a reference to Boolean types, since this is not an EXISTS or other related query.
3 Answers 3
The syntax for IN(...) is:
test_expression [ NOT ] IN
( subquery | expression [ ,...n ] )
With:
test_expression
Is any valid expression.
subquery
Is a subquery that has a result set of one column. This column must have the same data type as test_expression.
expression[ ,... n ]
Is a list of expressions to test for a match. All expressions must be of the same type as test_expression.
This means that only 1 column is allowed on both side. Even a list of expressions is considered as 1 dummy table with 1 column for expression values similar to:
SELECT exp FROM (values(exp1), (exp2), ...) as X(exp)
With 2 columns you can use a subquery with EXISTS:
Select VendorName, VendorCity, VendorState
FROM Vendors AS V1
WHERE NOT EXISTS (
SELECT 1
FROM Vendors AS V2
WHERE V2.VendorID <> V1.VendorID
AND VendorCity = VendorCity
AND VendorState = VendorState
)
Overall this subquery always return 0 row or 1 or more rows with 1 because it only needs to know if there is 1 or more rows with the same City and State.
The value is not important because it only looks at the number of row (=> if 1 or more rows EXISTS), hence the SELECT 1
. The test has already been done in the inner WHERE
clause.
I guess that VendorId is the PK. An index on VendorCity and VendorState would help.
Since you are learning and if you don't already know it, you can also look at the usage of APPLY (CROSS APPLY and OUTER APPLY). I let you try it and play with it.
As stated in the other answer, SQL Server does not currently allow your desired syntax. Other products do and there is a Connect item request here:
Add support for ANSI standard row value constructors.
It is possible to still use NOT IN
though
SELECT VendorName,
VendorCity,
VendorState
FROM Vendors AS V1
WHERE 1 NOT IN (SELECT 1
FROM Vendors AS V2
WHERE V2.VendorCity = V1.VendorCity
AND V2.VendorState = V1.VendorState
AND V2.VendorID <> V1.VendorID);
This gives a similar plan, with an anti semi join operator, as NOT EXISTS
-
1We could also (ab)use GROUP BY with a HAVING COUNT = 1ypercubeᵀᴹ– ypercubeᵀᴹ2016年03月05日 13:13:17 +00:00Commented Mar 5, 2016 at 13:13
Use CONCAT(...):
Select VendorName, VendorCity, VendorState
from Vendors AS V1
where CONCAT( V1.VendorCity,V1.VendorState ) NOT IN (
Select CONCAT( V2.VendorCity,V2.VendorState )
FROM Vendors AS V2
where V2.VendorID <> V1.VendorID
)
To address a comment left by Martin Smith:
This has at least a theoretical problem in that
(ABC,DEF)
will match(AB,CDEF)
- I'm not sure if any real world city / state pairs exist that could cause this issue to come up though. Also it can be less efficient dependent on indexes available.
The ABC | DEF
anagrams wouldn't be a problem in a real scenario since I don't think there are such combinations like ABCD | EF
, for example. The problem with the indexes could be easily solved by creating an index containing only "VendorCity, VendorState" columns.
Alternatively, he could create his own hashtable for these 2 columns and store the generated hash as an extra field in the same table.
I would love to see a benchmark comparing the performance of the given solutions (including mine).