Here's a simple example schema:
CREATE TABLE parents (
parent_id INT,
type text null,
PRIMARY KEY (parent_id)
);
CREATE TABLE children (
child_id INT,
parent_id INT,
name text,
PRIMARY KEY (child_id)
);
INSERT INTO parents (parent_id, type) VALUES (1, null);
INSERT INTO children (child_id, parent_id, name) VALUES (1, 1, 'foo');
This query works:
SELECT child_id
FROM children
JOIN parents USING (parent_id)
GROUP BY child_id, parent_id
If I change that to SELECT child_id, name
it still works, as it should because name
is functionally dependent on child_id
and child_id
is in the GROUP BY clause.
Now if I change it to SELECT child_id, type
I think it should still work, because type
is functionally dependent on parent_id
and parent_id
is in the GROUP BY clause. But instead I get an error message:
column "parents.type" must appear in the GROUP BY clause or be used in an aggregate function
What's the difference? Does it have anything to do with the JOIN?
1 Answer 1
I found out what is the difference.
When I write
SELECT child_id, type
FROM children
JOIN parents USING (parent_id)
GROUP BY child_id, parent_id
Postgres seems to assume I mean GROUP BY child_id, children.parent_id
and even though those two are guaranteed to be identical by the JOIN, they don't seem to count as functionally dependent. If instead I write
SELECT child_id, type
FROM children
JOIN parents USING (parent_id)
GROUP BY child_id, parents.parent_id
Postgres recognizes the functional dependency and allows type
to be in the SELECT expression.
parents.type
to select clause.parent_id
is already there and the same doesn't happen with thename
column.name
column to the SELECT but not to the GROUP BY works just fine.