SQLite - How does Count work without GROUP BY?

Question 1

With a table like:

first_name last_name
------------------------
jack frost
john wayne
betty white
null jackson

And this query:

select first_name, COUNT(first_name) from people;

MySQL and Postgres won't even run it. They somehow know that this requires a GROUP BY.

SQLite produces a result with a single row: jack 3

How do MySQL and Postgres know that this requires a GROUP BY? Why is GROUP BY even required?

Why is SQLite's result a single row? I would have expected the output to be something like:

jack 3
john 3
betty 3
null 3

Question 2

SQLite does not adhere to the standard in this regard. See https://www.sqlite.org/quirks.html#aggregate_queries_can_contain_non_aggregate_result_columns_that_are_not_in_the_group_by_clause

SQLite refers to first_name outside of the aggregate as a "bare" column, see section 2.5. in https://www.sqlite.org/lang_select.html#resultset

You can think of your query as if it looks like:

SELECT first_name, COUNT(first_name)
FROM tbl
GROUP BY () -- empty set

This means that your aggregate function applies to all rows in the result set, i.e.

jack, count({jack, john, betty, null})
john, count({jack, john, betty, null})
betty,count({jack, john, betty, null})
null, count({jack, john, betty, null})

null is not taken into consideration by count, so we end up with:

jack, 3
john, 3
betty,3
null, 3

Since an aggregate function (in this case COUNT) is supposed to aggregate per group, we should get 1 row in the result (we only have 1 group, the group for the empty set). Therefore, one row is randomly picked, say

john, 3

One might ask why not deviate further from the standard and allow a 4-row result? It's just a guess, but I suspect that the intention is to fix the first deviation eventually (probably via a setting similar to MySQL). I therefore suspect that they have no intention to add more fuel to the fire, when they will eventually try to fix the root cause.

If your intention was to count all rows (excluding nulls) for each first_name, you can use a window function:

select first_name, count(first_name) over () from tbl;

Fiddle

Question 3

@Lennart-SlavaUkraini excellent explanation. There is more documentation you might want to use (or add a link to it in the answer): SQLite SELECT sections 2.4 and especially 2.5

Question 4

This is not how the COUNT() function is supposed to be used.

According to the SQLite Documentation, COUNT() is an aggregate function just like MIN(), MAX(), SUM(), AVERAGE(), and GROUP_CONCAT().

Without the GROUP BY clause, COUNT() would aggregate the entire table, treating it as a group.

The latest version of MySQL and PostgreSQL would not allow for that. Very old versions of MySQL would.

In your case, what SQLite evidently did was

give you the first non-null value it saw for first_name
perform COUNT() aggregation of all non-null values of first_name

Question 5

Without a GROUP BY, COUNT (or SUM, etc) scans the entire table and delivers and summarizes the tally in a single row.
Since there will be only one row in your query, which first_name should it show? (This quandary leads to the "invalidity" of the SQL.)
COUNT(x) checks x for being NOT NULL. This is usually not needed, so say simply COUNT(*).

Here's a sample of a 'good' query:

SELECT first_name,
 COUNT(*) AS "num people with that first_name"
 FROM tbl
 GROUP BY first_name;

To get jack 3, do something like

SELECT ANY_VALUE(first_name),
 SUM(IF(first_name IS NULL, 0, 1))
 FROM tbl

There are various other ways to phrase the above. But note that every column has an aggregate and there is no GROUP BY -- hence 1 row summarizing the entire table.

score 14 · Accepted Answer · 2022-11-15 20:39:15Z

SQLite does not adhere to the standard in this regard. See https://www.sqlite.org/quirks.html#aggregate_queries_can_contain_non_aggregate_result_columns_that_are_not_in_the_group_by_clause

SQLite refers to first_name outside of the aggregate as a "bare" column, see section 2.5. in https://www.sqlite.org/lang_select.html#resultset

You can think of your query as if it looks like:

SELECT first_name, COUNT(first_name)
FROM tbl
GROUP BY () -- empty set

This means that your aggregate function applies to all rows in the result set, i.e.

jack, count({jack, john, betty, null})
john, count({jack, john, betty, null})
betty,count({jack, john, betty, null})
null, count({jack, john, betty, null})

null is not taken into consideration by count, so we end up with:

jack, 3
john, 3
betty,3
null, 3

Since an aggregate function (in this case COUNT) is supposed to aggregate per group, we should get 1 row in the result (we only have 1 group, the group for the empty set). Therefore, one row is randomly picked, say

john, 3

One might ask why not deviate further from the standard and allow a 4-row result? It's just a guess, but I suspect that the intention is to fix the first deviation eventually (probably via a setting similar to MySQL). I therefore suspect that they have no intention to add more fuel to the fire, when they will eventually try to fix the root cause.

If your intention was to count all rows (excluding nulls) for each first_name, you can use a window function:

select first_name, count(first_name) over () from tbl;

Fiddle

@Lennart-SlavaUkraini excellent explanation. There is more documentation you might want to use (or add a link to it in the answer): SQLite SELECT sections 2.4 and especially 2.5

Stack Exchange Network

SQLite - How does Count work without GROUP BY?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

SQLite - How does Count work without GROUP BY?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions