Is PostgreSQL's any_value
actually arbitrary when using ORDER BY?
The documentation states:
any_value ( anyelement ) → same as input type
Returns an arbitrary value from the non-null input values.
Example queries:
-- nonsensical
WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
SELECT any_value(v ORDER BY v = 3 DESC) FROM vals;
-- more realistic example
SELECT any_value(username ORDER BY user_id DESC) AS a_username FROM users;
I did see that version 17 documentation includes the following clause:
While all aggregates below accept an optional ORDER BY clause (as outlined in Section 4.2.7), the clause has only been added to aggregates whose output is affected by ordering.
But it doesn't mention whether any_value's output is affected by ordering.
Based on some trials, it appears to be deterministic.
(probably because any_value
just returns the first not null value and respects the ORDER BY)
As a follow up question, if it's not arbitrary, would it be "bad" to rely on this behavior?
2 Answers 2
Is the result of any_value()
arbitrary? Yes. Is it random? No. These are not incompatible statements.
Based on some trials, it appears to be deterministic. (probably because
any_value
just returns the first not null value and respects theORDER BY
)
Yes (as of the time of writing).
/*
* Transition function for the ANY_VALUE aggregate
*/
Datum
any_value_transfn(PG_FUNCTION_ARGS)
{
PG_RETURN_DATUM(PG_GETARG_DATUM(0));
}
would it be "bad" to rely on this behavior?
It depends on your definition of "bad". It would certainly be unwise, given that the behaviour might change without further notice, and that the very definition of "any" lacks determinism (in other words, if you want a certain value, don't ask for any value).
-
What happens if the query gets parallelized? I see at the moment ordered aggregation isn't supported in parallel plans, but what if that changes?Charlieface– Charlieface2025年02月20日 10:28:34 +00:00Commented Feb 20 at 10:28
-
I guess just like with any partial aggregation, you'll get
any_value()
out of the set ofany_value()
results of the partials.mustaccio– mustaccio2025年02月20日 12:54:01 +00:00Commented Feb 20 at 12:54 -
I would safely rely on the deterministic behavior. What in the world or outside of it could make us replace this trivial transition function with something more complicated?Laurenz Albe– Laurenz Albe2025年02月21日 09:12:28 +00:00Commented Feb 21 at 9:12
-
If one really wants the first value from an ordered set, it would be better to explicitly use
first_value()
, if only to state the intention for the future code maintainers.mustaccio– mustaccio2025年02月21日 15:13:58 +00:00Commented Feb 21 at 15:13
I think the intent is that the effect is only reliable for aggregates where they actually show the ORDER BY
clause in the function documentation. The rest will accept it, but its effect is not documented and should not be relied on.
In most cases ORDER BY
is irrelevant, since the functions are associative and commutative. This isn't true for ANY_VALUE()
, but since the clause isn't shown in the syntax, I don't think you should depend on it behaving in any specific way.
Notice that all the functions that mention it return some kind of array, and this clause specifies the order of the items in the array. ANY_VALUE()
doesn't return a collection, so order is not significant.