In PostgreSQL, is there a type-safe first() aggregate function?

Question 1

I'm looking for a first() aggregate function.
Here I found something that almost works:

CREATE OR REPLACE FUNCTION public.first_agg (anyelement, anyelement)
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
 SELECT 1ドル;
$$;
 
-- And then wrap an aggregate around it
CREATE AGGREGATE public.first (
 sfunc = public.first_agg,
 basetype = anyelement,
 stype = anyelement
);

The problem is that when a varchar(n) column passes through the first() function, it's converted into simple varchar (without size modifier). Trying to return the query in a function as RETURNS SETOF anyelement, I get the following error:

ERROR: structure of query does not match function result type Estado de
SQL:42804
Detalhe:Returned type character varying does not match expected type character varying(40) in column 2.
Contexto:PL/pgSQL function vsr_table_at_time(anyelement,timestamp without time zone) line 31 at RETURN QUERY

In the same wiki page there is a link to a C Version of the function that would replace the above. I don't know how to install it, but I wonder if this version could solve my problem.

Meanwhile, is there a way I can change the above function so it returns the exact same type of the input column?

Question 2

`DISTINCT ON()`

Just as a side note, this is precisely what DISTINCT ON() does (not to be confused with DISTINCT)

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

So if you were to write,

SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;

It's effectively

SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;

In that it takes the first z. There are two important differences,

You can also select other columns at no cost of further aggregation..

SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;

Because there is no GROUP BY you can not use (real) aggregates with it.

CREATE TABLE foo AS
SELECT * FROM ( VALUES
 (1,2,3),
 (1,2,4),
 (1,2,5)
) AS t(x,y,z);
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;
-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;
-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;

Don't forget `ORDER BY`

Also, while I didn't bold it then I will now

Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

Always use an ORDER BY with DISTINCT ON

Using an Ordered-Set Aggregate Function

I imagine a lot of people are looking for first_value, Ordered-Set Aggregate Functions. Just wanted to throw that out there. It would look like this, if the function existed:

SELECT a, b, first_value() WITHIN GROUP (ORDER BY z) 
FROM foo
GROUP BY a,b;

But, alas you can do this.

SELECT a, b, percentile_disc(0) WITHIN GROUP (ORDER BY z) 
FROM foo
GROUP BY a,b;

Question 3

The problem with this answer is that it only works if you want ONE aggregate in your select list, which is not implied by the question. If for example you want to select from one table and find several ordered first values, DISTINCT ON will not work in this case. It's not an aggregate function, you are actually filtering the data and so you can only do it once.

Question 4

This works great unless there are null values allowed on the column, if you do not want to collapse the null values. However, I was able to adapt this answer to also use a CASE statement in the DISTINCT ON expression - e.g. SELECT DISTINCT ON (CASE IF group_id IS NOT NULL THEN group_id ELSE other_unique_key END) which works perfectly.

Question 5

The percentile_disc(0) WITHIN GROUP (ORDER BY z) worked perfectly for me even for string/jsonb columns where I just wanted the first or any value (just a sample) for that column. min/max work on text columns but not on json/jsonb. For my use case, mode() WITHIN GROUP (ORDER BY z) also worked (to get the most frequent value).

Question 6

Error on order by z, "SELECT DISTINCT ON expressions must match initial ORDER BY expressions"

Question 7

Yay, I've found out an easy way with your case by using some features in PostgreSQL 9.4+

Let's see this example:

select (array_agg(val ORDER BY i))[1] as first_value_orderby_i,
 (array_agg(val ORDER BY i DESC))[1] as last_value_orderby_i,
 (array_agg(val))[1] as last_value_all,
 (array_agg(val))[array_length(array_agg(val),1)] as last_value_all
 FROM (
 SELECT i, random() as val
 FROM generate_series(1,100) s(i)
 ORDER BY random()
 ) tmp_tbl

I hope it will help you at your case.

Question 8

The problm with this solution is that it doesn't work with DOMAIN data types, or other small exceptions. It is also much more complex and time consuming, building up an array of the entire data set. The simple solution would be to create a custom aggregate, but so far I haven't found the ideal solution even with that. Window functions are also bad, since they can't be used the same way as you could use aggregates (with FILTER statements, or in CROSS JOIN LATERAL)

Question 9

Not a direct answer to your question but you should try the first_value window function. It works like this:

CREATE TABLE test (
 id SERIAL NOT NULL PRIMARY KEY,
 cat TEXT,
 value VARCHAR(2)
 date TIMESTAMP WITH TIME ZONE

);

Then, if you want the first item in each cat (category) you will query like that:

SELECT
 cat,
 first_value(date) OVER (PARTITION BY cat ORDER BY date)
FROM
 test;

or:

SELECT
 cat,
 first_value(date) OVER w
FROM
 test
WINDOW w AS (PARTITION BY cat ORDER BY date);

Question 10

Sorry, I don't think this apply to my use case. First_value is not an aggregation function, showing all records of the with a certain common value (your example cat) that is evaluated as being the first according to some order (your example date). My need is different. I need to, in the same select, agregate several columns by choosing the first not null value. That is, it should output a single record for each of value combinations in GROUP BY.

Question 11

The above can be made to work by throwing distinct into the mix: select distinct x, first_value(y) over (partition by x), first_value(z) over (partition by x) from .... Probably inefficient but enough for me to get on with prototyping. Definitely something to revisit though!

Question 12

Get the first value of a column and order by another in a group by

To get the first value of a column and order by another, you can:

Create a User Defined Type with both columns (the ordering column and the value column) as fields, then
Use the Ordered-Set Aggregate Function PERCENTILE_DISC(0) ordered by the newly created type.

The column to order by should be the first field in the type definition.

Note that, if the ordering column and the value column have the same type you can skip the type creation and use an ARRAY instead.

Example:

For example, if we want to find, for each group id, the intial value of column value based on timestamp time. We can define the type

CREATE TYPE time_value AS (
 time TIMESTAMP, -- ordering column
 value NUMERIC
)

and then aggregate as follows:

SELECT
 id,
 (
 PERCENTILE_DISC(0) WITHIN GROUP (ORDER BY (time, value)::time_value)
 ).value AS first_value
FROM my_table
GROUP BY id

With a value column time_value that has the same type as the ordering column time, we can skip the type creation. The aggregation is then as follows:

SELECT
 id,
 (
 PERCENTILE_DISC(0) WITHIN GROUP (ORDER BY ARRAY[time, time_value])
 )[2] AS first_time_value
FROM my_table
GROUP BY id

Test data:

CREATE TABLE my_table AS SELECT
 FLOOR(2 * RANDOM()) AS id,
 TIMESTAMP '2023-01-01' + time * INTERVAL '1 hour' AS time,
 FLOOR(10 * RANDOM()) AS value,
 TIMESTAMP '2023-01-01' + FLOOR(10 * RANDOM()) * INTERVAL '1 hour' AS time_value
FROM GENERATE_SERIES(1, 10) AS time

Question 13

I'm sure the question is no more actual for the topic starter, but for those who looking solution in 2021 there is a simple trick - array_agg

e.g. :

SELECT (array_agg(t.f0) FILTER (WHERE t.f0 IS NOT NULL))[1] agg FROM t GROUP BY t.f1

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges511 bronze badges · Accepted Answer · 2017-01-03 23:59:01Z

`DISTINCT ON()`

Just as a side note, this is precisely what DISTINCT ON() does (not to be confused with DISTINCT)

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

So if you were to write,

SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;

It's effectively

SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;

In that it takes the first z. There are two important differences,

You can also select other columns at no cost of further aggregation..

SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;

Because there is no GROUP BY you can not use (real) aggregates with it.

CREATE TABLE foo AS
SELECT * FROM ( VALUES
 (1,2,3),
 (1,2,4),
 (1,2,5)
) AS t(x,y,z);
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;
-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;
-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;

Don't forget `ORDER BY`

Also, while I didn't bold it then I will now

Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

Always use an ORDER BY with DISTINCT ON

Using an Ordered-Set Aggregate Function

I imagine a lot of people are looking for first_value, Ordered-Set Aggregate Functions. Just wanted to throw that out there. It would look like this, if the function existed:

SELECT a, b, first_value() WITHIN GROUP (ORDER BY z) 
FROM foo
GROUP BY a,b;

But, alas you can do this.

SELECT a, b, percentile_disc(0) WITHIN GROUP (ORDER BY z) 
FROM foo
GROUP BY a,b;

The problem with this answer is that it only works if you want ONE aggregate in your select list, which is not implied by the question. If for example you want to select from one table and find several ordered first values, DISTINCT ON will not work in this case. It's not an aggregate function, you are actually filtering the data and so you can only do it once.
This works great unless there are null values allowed on the column, if you do not want to collapse the null values. However, I was able to adapt this answer to also use a CASE statement in the DISTINCT ON expression - e.g. SELECT DISTINCT ON (CASE IF group_id IS NOT NULL THEN group_id ELSE other_unique_key END) which works perfectly.
The percentile_disc(0) WITHIN GROUP (ORDER BY z) worked perfectly for me even for string/jsonb columns where I just wanted the first or any value (just a sample) for that column. min/max work on text columns but not on json/jsonb. For my use case, mode() WITHIN GROUP (ORDER BY z) also worked (to get the most frequent value).
Error on order by z, "SELECT DISTINCT ON expressions must match initial ORDER BY expressions"

Stack Exchange Network

In PostgreSQL, is there a type-safe first() aggregate function?

5 Answers 5

`DISTINCT ON()`

Don't forget `ORDER BY`

Using an Ordered-Set Aggregate Function

Get the first value of a column and order by another in a group by

Example:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

In PostgreSQL, is there a type-safe first() aggregate function?

5 Answers 5

DISTINCT ON()

Don't forget ORDER BY

Using an Ordered-Set Aggregate Function

Get the first value of a column and order by another in a group by

Example:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`DISTINCT ON()`

Don't forget `ORDER BY`