How to remove duplicates from any array and preserve ordering in PostgreSQL?

Question 1

I'm looking for a way to eliminate duplicates from a PostgreSQL array while preserving the ordering of its elements. What I currently have are the following functions:

create function array_unique( anyarray ) 
 returns anyarray immutable strict language sql as $$
 select array( select distinct unnest( 1ドル ) ); $$;
create function array_unique_sorted( anyarray ) 
 returns anyarray immutable strict language sql as $$
 select array( select distinct unnest( 1ドル ) order by 1 ); $$;
/* ### TAINT there ought to be a simpler, declarative solution */
create function array_unique_stable( text[] )
 returns text[] immutable strict parallel safe language plpgsql as $$
 declare
 R text[] = '{}';
 ¶element text;
 begin
 foreach ¶element in array 1ドル loop
 if not array[ ¶element ] && R then
 R := R || array[ ¶element ];
 end if;
 end loop;
 return R; end; $$;

In the above, array_unique takes an array of any type and returns a copy with all duplicates removed; their relative ordering is arbitrary. array_unique_sorted is like array_unique, but the elements are sorted relative to each other; this is sometimes useful as all arrays with the same set of of distinct elements will compare equal after being normalized by this function.

array_unique_stable already does what I'm looking for: given an array (which in this exampe must be a text[] array), it scans elements from left to right; whenever it encounters a previously unseen element, it adds that one to the result. Thus, only the first occurrence of each value is kept.

However, the implementation has some drawbacks: First, there seems to be no way to write it so it accepts the pseudo-type anyarray.

Second, while the first two functions are written in SQL, they may presumably be inlined, array_unique_stable is written in PL/pgSQL, and so it cannot be inlined.

Third, it bugs me that I couldn't come up with a solution in pure SQL...

Question 2

If you need to do something like that, using an array was probably the wrong choice.

Question 3

I do not think so. Using arrays, sorting things, keeping elements distinct are different and very frequent requirements; any subset of the three and all three together should be achievable in straightforward ways, IMHO

Question 4

This can indeed be done using pure SQL:

create function array_unique_stable(p_input anyarray)
 returns anyarray immutable strict parallel safe 
 language sql
as 
$$
select array_agg(t order by x)
from (
 select distinct on (t) t,x
 from unnest(p_input) with ordinality as p(t,x)
 order by t,x
) t2;
$$

The unnest(p_input) with ordinality will return the original index of the element in the array which is then used to aggregate them back in the outer query.

select array_unique_stable(array['a','x','x','b']) as text_array, 
 array_unique_stable(array[10,1,1,5,8,8]) as int_array

returns

text_array | int_array 
-----------+-----------
{a,x,b} | {10,1,5,8}

and

Question 5

Thanks indeed! I realize I was soooo close before I threw my code away and rewrote it in PL/pgSQL... FWIW there's an essentially identical answer over at stackoverflow.com/a/42399297/7568091 which I have since found; in other news, there are lots of incorrect answers to similar questions here on DBA and on over on the general SE, so I guess it's good to have another correct answer here.

user1822user1822 · Accepted Answer · 2018-07-06 05:41:39Z

This can indeed be done using pure SQL:

create function array_unique_stable(p_input anyarray)
 returns anyarray immutable strict parallel safe 
 language sql
as 
$$
select array_agg(t order by x)
from (
 select distinct on (t) t,x
 from unnest(p_input) with ordinality as p(t,x)
 order by t,x
) t2;
$$

The unnest(p_input) with ordinality will return the original index of the element in the array which is then used to aggregate them back in the outer query.

select array_unique_stable(array['a','x','x','b']) as text_array, 
 array_unique_stable(array[10,1,1,5,8,8]) as int_array

returns

text_array | int_array 
-----------+-----------
{a,x,b} | {10,1,5,8}

and

Thanks indeed! I realize I was soooo close before I threw my code away and rewrote it in PL/pgSQL... FWIW there's an essentially identical answer over at stackoverflow.com/a/42399297/7568091 which I have since found; in other news, there are lots of incorrect answers to similar questions here on DBA and on over on the general SE, so I guess it's good to have another correct answer here.

Stack Exchange Network

How to remove duplicates from any array and preserve ordering in PostgreSQL?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to remove duplicates from any array and preserve ordering in PostgreSQL?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions