In SQL Server, there is two tables: Houses, and their images.
I need a list with 20 houses with the first of their images (only one). I tried:
SELECT top 20 h.id, h.name, im.id, im.name
FROM image im
INNER JOIN house h ON im.house_id = h.id
WHERE 1=1 AND im.id=(SELECT TOP (1) im2.id FROM image im2 WHERE im.id=im2.id ORDER BY image_code)
but that runs very slowly. There is any way to improve this query?
EDIT:
With the query:
SELECT h.id, h.name, im.id, im.name -- What you want to select
FROM _house h, _image im -- Tables in join
WHERE h.id = im.id_house -- The join (equivalent to inner join)
GROUP BY h.id -- This compresses all entries with the
-- same h.id into a single row
HAVING im.id = min(im.id) -- This is how we select across a group
-- (thus compressing the image table per house)
I'm getting a error message:
_image.id' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
Then, I change to:
SELECT h.id, h.name, im.id, im.name -- What you want to select
FROM _house h, _image im -- Tables in join
WHERE h.id = im.house_id -- The join (equivalent to inner join)
GROUP BY h.id,im.id, h.name, im.name -- This compresses all entries with the
-- same h.id into a single row
HAVING im.id = min(im.id)
And then I get this result:
enter image description here
How can I take out the repeated values?
EDIT2:
If somebody want to test the queries, this is the script to create the tables and the data that I'm using now (the real data is about 1Million rows):
CREATE TABLE _house(
[id] [int] NOT NULL,
[name] [varchar](50) NULL
)
CREATE TABLE _image(
[id] [int] NULL,
[name] [varchar](50) NULL,
[house_id] [int] NULL
)
insert into _house (id, name) values (1,'house1');
insert into _house (id, name) values (2,'house2');
insert into _image (id, name, house_id) values (31,'img1',1);
insert into _image (id, name, house_id) values (32,'img2',2);
insert into _image (id, name, house_id) values (33,'img3',2);
insert into _image (id, name, house_id) values (34,'img4',2);
-
\$\begingroup\$ In SQL Server 2005 or newer version you could use ranking functions to fetch top N rows per match. \$\endgroup\$Andriy M– Andriy M2011年11月25日 14:29:04 +00:00Commented Nov 25, 2011 at 14:29
3 Answers 3
I don't know if there is a faster way, but I would use sub-queries. For example:
select top 20 h.id, h.name, im.mid, i.name
from _house h
join
(
select min(id) as mid,house_id from _image
group by house_id
) im on im.house_id=h.id
join _image i on i.id=im.mid
Depending on the context it might be faster to generate a temporary table with just one image per house.
You should be using the clause group by
SELECT h.id, h.name, im.id, im.name -- What you want to select
FROM house h,image im -- Tables in join
WHERE h.id = im.house_id -- The join (equivalent to inner join)
GROUP BY h.id -- This compresses all entries with the
-- same h.id into a single row
HAVING min(im.id) -- This is how we select across a group
-- (thus compressing the image table per house)
LIMIT 20; -- Selecting the first n values is very
-- DB specific on mysql use the limit clause
-- But I see in your DB it is `top 20`
Note:
Accosding to this page: http://developer.mimer.com/validator/parser200x/index.tml#parser
The having clause is more standard when specified like this (though I can't test this).
HAVING im.id = min(im.id)
Edit (Based on question Edit).
Your problem is this line:
GROUP BY h.id, im.id, h.name, im.name
This means for every line that is unique across all four values will be compressed together (ie if all four values are the same the lines are compressed together). You need to maintain the original GROUP BY
clause (and fix another part of the query).
GROUP BY h.id
I can't test this as I only have MySQL available and you seem to be using an MS product (and my original query worked on MySQL). But based on the error message:
*_image.id' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.*
We don't want to add anything to the GROUP BY
clause. Thus following the error message indicates we need to use aggregate functions (in the select probably).
Try changing the select:
SELECT h.id, h.name, min(im.id), im.name
^^^^^^^^^^
I am sure if you play around with this you should be able to get it working. Sorry I can not be more exact but that would require using the same product as you.
-
\$\begingroup\$ Your use of the having and limit clauses are not valid in SQL Server 2008 (and so presumably also not valid in earlier versions). You might need to join onto a sub-select that does the grouping instead. \$\endgroup\$Brian Reichle– Brian Reichle2011年09月25日 00:25:03 +00:00Commented Sep 25, 2011 at 0:25
-
\$\begingroup\$ I can accept that the LIMIT is not valid SQL (its an extension). The
HAVING
clause is standard SQL though different implementations I have seen have different aggregation functions (though I have not see one that did not support min()). A quick check here developer.mimer.com/validator/parser200x/index.tml#parser indicates a minor error. \$\endgroup\$Loki Astari– Loki Astari2011年09月25日 05:02:37 +00:00Commented Sep 25, 2011 at 5:02 -
\$\begingroup\$ My issue was not with the existance of the having clause, but with it having a non-boolean expression. I did try
im.id = min(im.id)
before commenting but, in SQL Server, the having clause expects all referenced fields to appear in an aggregating function or in the group by clause. Pitty, something like this would be very useful :) \$\endgroup\$Brian Reichle– Brian Reichle2011年09月25日 06:11:16 +00:00Commented Sep 25, 2011 at 6:11 -
\$\begingroup\$ It doesn't work, the problem is that with the new query, I get repeat values: \$\endgroup\$user674887– user6748872011年09月26日 12:50:24 +00:00Commented Sep 26, 2011 at 12:50
-
\$\begingroup\$ @user674887: You are going to have to be more specific. Are you getting more than 1 row for each h.id? If so then I would look at your table data to make sure that it is unique (ie do you have a key with spaces in it). \$\endgroup\$Loki Astari– Loki Astari2011年09月26日 18:21:58 +00:00Commented Sep 26, 2011 at 18:21
GROUP BY x
collapses all the rows with the same value of x
into one row. Your query FROM _house h, _image im ... GROUP BY h.id
is not right because it does not say what to do with _image
.
FROM _house h, _image im ... GROUP BY h.id, im.id, h.name, im.name
is not what you want because that keeps every possible combination of h.id, im.id, h.name, and im.name; but you do not want all possible im
rows, only the rows where im.id
is the minimum value.
You want to collapse all rows of _image
with the same house_id
, or GROUP BY house_id
. Then for each of these rows you want the minimum id
:
SELECT house_id, Min(id) FROM _image GROUP BY house_id
That gives you the minimum _image.id
for each house_id
. Now if you want to find the _house.name
that has this minimum id, you have to join the house_id
against _house.id
. You could put the previous query into a temporary table and join against that, but I believe SQL Server allows joining against a subselect:
SELECT h.id, h.name, mi.minImageId
FROM _house h
JOIN (SELECT house_id, Min(id) AS minImageId
FROM _image GROUP BY house_id) mi ON mi.house_id = h.id
I gave Min(id)
a name because we are going to need it later. You want to find name
of the _image
row with the minimum id
for each row in your GROUP BY
subselect. You do not want to put that in your GROUP BY
subselect because that will, again, include every possible name
. You only want the name
of the _image
row with the minimum Id, which we now know and have named minImageId
. Joining the subselect against that should give you what you want:
SELECT h.id, h.name, mi.minImageId, i.name
FROM _house h
JOIN (SELECT house_id, Min(id) AS minImageId
FROM _image GROUP BY house_id) mi ON mi.house_id = h.id
JOIN _image i ON i.id = mi.minImageId
-
\$\begingroup\$ The last query doesn't work, but I upvoted because the explanations were very interesting. Thanks \$\endgroup\$user674887– user6748872011年10月03日 10:56:00 +00:00Commented Oct 3, 2011 at 10:56
-
\$\begingroup\$ @user674887, it would help if you told me what "doesn't work". \$\endgroup\$Dour High Arch– Dour High Arch2011年10月03日 17:56:57 +00:00Commented Oct 3, 2011 at 17:56
-
\$\begingroup\$ You're right. The problem is the colon of the second line: "FROM _house h," that causes "Incorrect syntax near the keyword 'join'.". When I leaved, I get the same perfomance as the accepted answer. Thanks \$\endgroup\$user674887– user6748872011年10月04日 06:27:49 +00:00Commented Oct 4, 2011 at 6:27
-
\$\begingroup\$ There's a redundant comma in your last code snippet, the one after
_house h
. \$\endgroup\$Andriy M– Andriy M2011年11月25日 14:22:06 +00:00Commented Nov 25, 2011 at 14:22 -
\$\begingroup\$ Thanks guys. Now that I've fixed it, I see it is exactly the same as what @Mike had already posted... \$\endgroup\$Dour High Arch– Dour High Arch2012年01月18日 00:33:18 +00:00Commented Jan 18, 2012 at 0:33