Inner join with first result

Question 1

In SQL Server, there is two tables: Houses, and their images.

I need a list with 20 houses with the first of their images (only one). I tried:

SELECT top 20 h.id, h.name, im.id, im.name
 FROM image im 
 INNER JOIN house h ON im.house_id = h.id
 WHERE 1=1 AND im.id=(SELECT TOP (1) im2.id FROM image im2 WHERE im.id=im2.id ORDER BY image_code)

but that runs very slowly. There is any way to improve this query?

EDIT:

With the query:

SELECT h.id, h.name, im.id, im.name -- What you want to select
FROM _house h, _image im -- Tables in join
WHERE h.id = im.id_house -- The join (equivalent to inner join)
GROUP BY h.id -- This compresses all entries with the
 -- same h.id into a single row 
HAVING im.id = min(im.id) -- This is how we select across a group
 -- (thus compressing the image table per house)

I'm getting a error message:

_image.id' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.

Then, I change to:

SELECT h.id, h.name, im.id, im.name -- What you want to select
FROM _house h, _image im -- Tables in join
WHERE h.id = im.house_id -- The join (equivalent to inner join)
GROUP BY h.id,im.id, h.name, im.name -- This compresses all entries with the
 -- same h.id into a single row 
HAVING im.id = min(im.id)

And then I get this result:

enter image description here

How can I take out the repeated values?

EDIT2:

If somebody want to test the queries, this is the script to create the tables and the data that I'm using now (the real data is about 1Million rows):

CREATE TABLE _house(
 [id] [int] NOT NULL,
 [name] [varchar](50) NULL
) 
CREATE TABLE _image(
 [id] [int] NULL,
 [name] [varchar](50) NULL,
 [house_id] [int] NULL
) 
insert into _house (id, name) values (1,'house1');
insert into _house (id, name) values (2,'house2');
insert into _image (id, name, house_id) values (31,'img1',1);
insert into _image (id, name, house_id) values (32,'img2',2);
insert into _image (id, name, house_id) values (33,'img3',2);
insert into _image (id, name, house_id) values (34,'img4',2);

Question 2

In SQL Server 2005 or newer version you could use ranking functions to fetch top N rows per match.

Question 3

I don't know if there is a faster way, but I would use sub-queries. For example:

select top 20 h.id, h.name, im.mid, i.name
from _house h
join
(
select min(id) as mid,house_id from _image
group by house_id
) im on im.house_id=h.id
join _image i on i.id=im.mid

Depending on the context it might be faster to generate a temporary table with just one image per house.

Question 4

You should be using the clause group by

SELECT h.id, h.name, im.id, im.name -- What you want to select
FROM house h,image im -- Tables in join
WHERE h.id = im.house_id -- The join (equivalent to inner join)
GROUP BY h.id -- This compresses all entries with the
 -- same h.id into a single row 
HAVING min(im.id) -- This is how we select across a group
 -- (thus compressing the image table per house)
LIMIT 20; -- Selecting the first n values is very
 -- DB specific on mysql use the limit clause
 -- But I see in your DB it is `top 20`

Note:

Accosding to this page: http://developer.mimer.com/validator/parser200x/index.tml#parser

The having clause is more standard when specified like this (though I can't test this).

HAVING im.id = min(im.id)

Edit (Based on question Edit).

Your problem is this line:

GROUP BY h.id, im.id, h.name, im.name

This means for every line that is unique across all four values will be compressed together (ie if all four values are the same the lines are compressed together). You need to maintain the original GROUP BY clause (and fix another part of the query).

GROUP BY h.id

I can't test this as I only have MySQL available and you seem to be using an MS product (and my original query worked on MySQL). But based on the error message:

*_image.id' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.*

We don't want to add anything to the GROUP BY clause. Thus following the error message indicates we need to use aggregate functions (in the select probably).

Try changing the select:

SELECT h.id, h.name, min(im.id), im.name 
 ^^^^^^^^^^

I am sure if you play around with this you should be able to get it working. Sorry I can not be more exact but that would require using the same product as you.

Question 5

Your use of the having and limit clauses are not valid in SQL Server 2008 (and so presumably also not valid in earlier versions). You might need to join onto a sub-select that does the grouping instead.

Question 6

I can accept that the LIMIT is not valid SQL (its an extension). The HAVING clause is standard SQL though different implementations I have seen have different aggregation functions (though I have not see one that did not support min()). A quick check here developer.mimer.com/validator/parser200x/index.tml#parser indicates a minor error.

Question 7

My issue was not with the existance of the having clause, but with it having a non-boolean expression. I did try im.id = min(im.id) before commenting but, in SQL Server, the having clause expects all referenced fields to appear in an aggregating function or in the group by clause. Pitty, something like this would be very useful :)

Question 8

It doesn't work, the problem is that with the new query, I get repeat values:

Question 9

@user674887: You are going to have to be more specific. Are you getting more than 1 row for each h.id? If so then I would look at your table data to make sure that it is unique (ie do you have a key with spaces in it).

Question 10

GROUP BY x collapses all the rows with the same value of x into one row. Your query FROM _house h, _image im ... GROUP BY h.id is not right because it does not say what to do with _image.

FROM _house h, _image im ... GROUP BY h.id, im.id, h.name, im.name is not what you want because that keeps every possible combination of h.id, im.id, h.name, and im.name; but you do not want all possible im rows, only the rows where im.id is the minimum value.

You want to collapse all rows of _image with the same house_id, or GROUP BY house_id. Then for each of these rows you want the minimum id:

SELECT house_id, Min(id) FROM _image GROUP BY house_id

That gives you the minimum _image.id for each house_id. Now if you want to find the _house.name that has this minimum id, you have to join the house_id against _house.id. You could put the previous query into a temporary table and join against that, but I believe SQL Server allows joining against a subselect:

SELECT h.id, h.name, mi.minImageId
FROM _house h
 JOIN (SELECT house_id, Min(id) AS minImageId
 FROM _image GROUP BY house_id) mi ON mi.house_id = h.id

I gave Min(id) a name because we are going to need it later. You want to find name of the _image row with the minimum id for each row in your GROUP BY subselect. You do not want to put that in your GROUP BY subselect because that will, again, include every possible name. You only want the name of the _image row with the minimum Id, which we now know and have named minImageId. Joining the subselect against that should give you what you want:

SELECT h.id, h.name, mi.minImageId, i.name
FROM _house h
 JOIN (SELECT house_id, Min(id) AS minImageId
 FROM _image GROUP BY house_id) mi ON mi.house_id = h.id
 JOIN _image i ON i.id = mi.minImageId

Question 11

The last query doesn't work, but I upvoted because the explanations were very interesting. Thanks

Question 12

@user674887, it would help if you told me what "doesn't work".

Question 13

You're right. The problem is the colon of the second line: "FROM _house h," that causes "Incorrect syntax near the keyword 'join'.". When I leaved, I get the same perfomance as the accepted answer. Thanks

Question 14

There's a redundant comma in your last code snippet, the one after _house h.

Question 15

Thanks guys. Now that I've fixed it, I see it is exactly the same as what @Mike had already posted...

Mike Polen Mike Polen 2361 silver badge5 bronze badges · Accepted Answer · 2011-09-27 15:16:36Z

I don't know if there is a faster way, but I would use sub-queries. For example:

select top 20 h.id, h.name, im.mid, i.name
from _house h
join
(
select min(id) as mid,house_id from _image
group by house_id
) im on im.house_id=h.id
join _image i on i.id=im.mid

Depending on the context it might be faster to generate a temporary table with just one image per house.

Stack Exchange Network

Inner join with first result

3 Answers 3

Edit (Based on question Edit).

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Inner join with first result

3 Answers 3

Edit (Based on question Edit).

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions