Given a table t
:
id | name
------------
1 | abcfug
1 | deffug
1 | hijfug
2 | etc
How can I do something like:
select string_agg(strip_lcs(name), ', ') from t where id = 1
returning:
abc, def, hij
NB I wrote an aggregate function to return lcs if that helps:
CREATE FUNCTION lcs_iterate(_state TEXT, _value TEXT)
RETURNS TEXT
AS
$$
SELECT RIGHT(2,ドル s - 1)
FROM generate_series(1, LEAST(LENGTH(1ドル), LENGTH(2ドル))) s
WHERE RIGHT(1,ドル s) <> RIGHT(2,ドル s)
UNION ALL
SELECT LEAST(1,ドル 2ドル)
LIMIT 1;
$$
LANGUAGE 'sql';
CREATE AGGREGATE lcs(TEXT) (SFUNC = lcs_iterate, STYPE = TEXT);
-
1Your Postgres version? Min / max / avg length of string? Min / max / avg length of suffix? Min / max / avg number of rows per group? There are many ways to solve this, the tricky part is to make it fast. See: dba.stackexchange.com/a/43444/3684Erwin Brandstetter– Erwin Brandstetter2018年11月23日 15:03:40 +00:00Commented Nov 23, 2018 at 15:03
-
@erwin currently 9.1, and these are FQDNs typically around 25 chars. Small table and performed infrequently so performance not a huge deal, clear to understand code preferable. Will study your link, thanks.Sam– Sam2018年11月24日 21:53:52 +00:00Commented Nov 24, 2018 at 21:53
-
Can you provide the actual examples?Evan Carroll– Evan Carroll2018年11月26日 06:02:23 +00:00Commented Nov 26, 2018 at 6:02
-
funny that not even the lcs_iterate function works for me.Gunther Schadow– Gunther Schadow2022年02月11日 23:32:43 +00:00Commented Feb 11, 2022 at 23:32
2 Answers 2
Your aggregate function is smart and fast, but there is a bug. If one string matches the tail of another completely, the UNION ALL
part kicks in to return LEAST(1,ドル 2ドル)
. That must instead be something like CASE WHEN length(1ドル) > length(2ドル) THEN 2ドル ELSE 1ドル END
. Test with 'match' and 'aa_match'. (See fiddle below.)
Plus, make the function IMMUTABLE STRICT
:
CREATE OR REPLACE FUNCTION lcs_iterate(_state text, _value text)
RETURNS text AS
$func$
SELECT right(2,ドル s - 1)
FROM generate_series(1, least(length(1ドル), length(2ドル))) s
WHERE right(1,ドル s) <> right(2,ドル s)
UNION ALL
SELECT CASE WHEN length(1ドル) > length(2ドル) THEN 2ドル ELSE 1ドル END -- !
LIMIT 1;
$func$ LANGUAGE sql IMMUTABLE STRICT; -- !
NULL values are ignored and empty strings lead to zero-length common suffix. You may want to treat these special cases differently ...
While we only need the length of the common suffix, a very simple FINALFUNC
returns just that:
CREATE AGGREGATE lcs_len(text) (
SFUNC = lcs_iterate
, STYPE = text
, FINALFUNC = length() -- !
);
Then your query can look like:
SELECT string_agg(trunc, ', ') AS truncated_names
FROM (
SELECT left(name, -lcs_len(name) OVER ()) AS trunc
FROM tbl
WHERE id = 1
) sub;
.. using the custom aggregate as window function.
db<>fiddle here
I also tested with Postgres 9.4, and it should work with your outdated Postgres 9.1, but that's too old for me to test. Consider upgrading to a current version.
Related:
-
1Thank you @Erwin. NB I did finalfunc = length, rather than defining lcs_final_len; I did -lcs_len(host) OVER (), rather than lcs_len(name) OVER (PARTITION BY id) * - 1.Sam– Sam2018年11月26日 08:59:58 +00:00Commented Nov 26, 2018 at 8:59
-
Yes, a custom function for
finalfunc
was overkill for the simple case. And yes,PARTITION
is only needed for multiple IDs in the result (like demonstrated in the fiddle). I added your improvements.Erwin Brandstetter– Erwin Brandstetter2018年11月26日 13:28:24 +00:00Commented Nov 26, 2018 at 13:28
Given that you have an aggregate that finds the longest common suffix
WITH x AS(
SELECT left(name,length(name)-length(lcs(name) over ())) AS s
FROM t WHERE id = 1
)
SELECT string_agg(s,', ') FROM x;
-
1Thank you for your solution, it was very useful, and it was difficult to chose which answer to accept.Sam– Sam2018年11月26日 08:52:38 +00:00Commented Nov 26, 2018 at 8:52
Explore related questions
See similar questions with these tags.