With a table like:
first_name last_name
------------------------
jack frost
john wayne
betty white
null jackson
And this query:
select first_name, COUNT(first_name) from people;
MySQL and Postgres won't even run it. They somehow know that this requires a GROUP BY.
SQLite produces a result with a single row: jack 3
How do MySQL and Postgres know that this requires a GROUP BY? Why is GROUP BY even required?
Why is SQLite's result a single row? I would have expected the output to be something like:
jack 3
john 3
betty 3
null 3
3 Answers 3
SQLite does not adhere to the standard in this regard. See https://www.sqlite.org/quirks.html#aggregate_queries_can_contain_non_aggregate_result_columns_that_are_not_in_the_group_by_clause
SQLite refers to first_name outside of the aggregate as a "bare" column, see section 2.5. in https://www.sqlite.org/lang_select.html#resultset
You can think of your query as if it looks like:
SELECT first_name, COUNT(first_name)
FROM tbl
GROUP BY () -- empty set
This means that your aggregate function applies to all rows in the result set, i.e.
jack, count({jack, john, betty, null})
john, count({jack, john, betty, null})
betty,count({jack, john, betty, null})
null, count({jack, john, betty, null})
null
is not taken into consideration by count
, so we end up with:
jack, 3
john, 3
betty,3
null, 3
Since an aggregate function (in this case COUNT) is supposed to aggregate per group, we should get 1 row in the result (we only have 1 group, the group for the empty set). Therefore, one row is randomly picked, say
john, 3
One might ask why not deviate further from the standard and allow a 4-row result? It's just a guess, but I suspect that the intention is to fix the first deviation eventually (probably via a setting similar to MySQL). I therefore suspect that they have no intention to add more fuel to the fire, when they will eventually try to fix the root cause.
If your intention was to count all rows (excluding nulls) for each first_name, you can use a window function:
select first_name, count(first_name) over () from tbl;
-
1@Lennart-SlavaUkraini excellent explanation. There is more documentation you might want to use (or add a link to it in the answer): SQLite SELECT sections 2.4 and especially 2.5ypercubeᵀᴹ– ypercubeᵀᴹ2022年11月16日 17:00:15 +00:00Commented Nov 16, 2022 at 17:00
This is not how the COUNT() function is supposed to be used.
According to the SQLite Documentation, COUNT() is an aggregate function just like MIN(), MAX(), SUM(), AVERAGE(), and GROUP_CONCAT().
Without the GROUP BY clause, COUNT() would aggregate the entire table, treating it as a group.
The latest version of MySQL and PostgreSQL would not allow for that. Very old versions of MySQL would.
In your case, what SQLite evidently did was
- give you the first non-null value it saw for first_name
- perform COUNT() aggregation of all non-null values of first_name
Without a
GROUP BY
,COUNT
(orSUM
, etc) scans the entire table and delivers and summarizes the tally in a single row.Since there will be only one row in your query, which
first_name
should it show? (This quandary leads to the "invalidity" of the SQL.)COUNT(x)
checksx
for beingNOT NULL
. This is usually not needed, so say simplyCOUNT(*)
.Here's a sample of a 'good' query:
SELECT first_name, COUNT(*) AS "num people with that first_name" FROM tbl GROUP BY first_name;
To get
jack 3
, do something likeSELECT ANY_VALUE(first_name), SUM(IF(first_name IS NULL, 0, 1)) FROM tbl
There are various other ways to phrase the above. But note that every column has an aggregate and there is no
GROUP BY
-- hence 1 row summarizing the entire table.