3

Is PostgreSQL's any_value actually arbitrary when using ORDER BY?

The documentation states:

any_value ( anyelement ) → same as input type

Returns an arbitrary value from the non-null input values.

Example queries:

-- nonsensical
WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
SELECT any_value(v ORDER BY v = 3 DESC) FROM vals;
-- more realistic example
SELECT any_value(username ORDER BY user_id DESC) AS a_username FROM users;

I did see that version 17 documentation includes the following clause:

While all aggregates below accept an optional ORDER BY clause (as outlined in Section 4.2.7), the clause has only been added to aggregates whose output is affected by ordering.

But it doesn't mention whether any_value's output is affected by ordering.

Based on some trials, it appears to be deterministic. (probably because any_value just returns the first not null value and respects the ORDER BY)


As a follow up question, if it's not arbitrary, would it be "bad" to rely on this behavior?

asked Feb 20 at 1:42
0

2 Answers 2

9

Is the result of any_value() arbitrary? Yes. Is it random? No. These are not incompatible statements.

Based on some trials, it appears to be deterministic. (probably because any_value just returns the first not null value and respects the ORDER BY)

Yes (as of the time of writing).

/*
 * Transition function for the ANY_VALUE aggregate
 */
Datum
any_value_transfn(PG_FUNCTION_ARGS)
{
 PG_RETURN_DATUM(PG_GETARG_DATUM(0));
}

would it be "bad" to rely on this behavior?

It depends on your definition of "bad". It would certainly be unwise, given that the behaviour might change without further notice, and that the very definition of "any" lacks determinism (in other words, if you want a certain value, don't ask for any value).

answered Feb 20 at 2:55
4
  • What happens if the query gets parallelized? I see at the moment ordered aggregation isn't supported in parallel plans, but what if that changes? Commented Feb 20 at 10:28
  • I guess just like with any partial aggregation, you'll get any_value() out of the set of any_value() results of the partials. Commented Feb 20 at 12:54
  • I would safely rely on the deterministic behavior. What in the world or outside of it could make us replace this trivial transition function with something more complicated? Commented Feb 21 at 9:12
  • If one really wants the first value from an ordered set, it would be better to explicitly use first_value(), if only to state the intention for the future code maintainers. Commented Feb 21 at 15:13
2

I think the intent is that the effect is only reliable for aggregates where they actually show the ORDER BY clause in the function documentation. The rest will accept it, but its effect is not documented and should not be relied on.

In most cases ORDER BY is irrelevant, since the functions are associative and commutative. This isn't true for ANY_VALUE(), but since the clause isn't shown in the syntax, I don't think you should depend on it behaving in any specific way.

Notice that all the functions that mention it return some kind of array, and this clause specifies the order of the items in the array. ANY_VALUE() doesn't return a collection, so order is not significant.

answered Feb 20 at 15:49

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.