5

Let's say I have a text field with this value in Postgres:

'bar$foo$john$doe$xxx'

I'd like to replace just the last occurrence of the dollar ($) character with another character, for example '-'. After the replacement, the contents of the field should be:

'bar$foo$john$doe-xxx'
Evan Carroll
65.7k50 gold badges259 silver badges510 bronze badges
asked Nov 29, 2021 at 11:06

3 Answers 3

15

Introduction:

This problem involves a bit of lateral thinking. The last occurrence of any character is also the first occurrence of that character in the string when it is reversed! All of the solutions (bar one) use this approach.

For all of the 5 solutions presented, we have the following (a fiddle with all the code below is available here. An individual fiddle with each solution separately is included with each solution below):

CREATE TABLE test
(
 id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
 t_field TEXT
);

The PRIMARY KEY is only required by the 5th solution. Then we run the following queries to populate the table - the first record is the OP's own data - the rest is randomly generated!

OP:

INSERT INTO test (t_field)
VALUES ('bar$foo$john$doe$xxx'); -- OP's own data

Random data:

INSERT INTO test (t_field)
SELECT 
 LEFT(MD5(RANDOM()::TEXT), FLOOR(RANDOM() * (5 - 3 + 1) + 3)::INT) || '$' ||
 LEFT(MD5(RANDOM()::TEXT), FLOOR(RANDOM() * (5 - 3 + 1) + 3)::INT) || '$' ||
 LEFT(MD5(RANDOM()::TEXT), FLOOR(RANDOM() * (5 - 3 + 1) + 3)::INT) || '$' ||
 LEFT(MD5(RANDOM()::TEXT), FLOOR(RANDOM() * (5 - 3 + 1) + 3)::INT) || '$' ||
 LEFT(MD5(RANDOM()::TEXT), FLOOR(RANDOM() * (5 - 3 + 1) + 3)::INT)
FROM 
 GENERATE_SERIES(1, 29999); --<<== Vary here
 
--
-- For this fiddle, we only have 30,000 (29,999 + the OP's original datum) records 
-- (although relative magnitudes appear good), the individual fiddles use
-- 300,000. 
--
-- 300,000 appears large enough to give reliable consistent results and small 
-- enough so that the fiddle doesn't fail too often - rarely fails on 30k.
--
--
-- You can vary this number, but please consider using the individual fiddles for
-- large numbers of records so as not to hit the db<>fiddle server too hard!
--
-- The home test VM used 10,000,000 records - 16 GB RAM, 1 CPU, SSD
--

The solutions will be presented in order of performance. It was tested on both db<>fiddle and on a home VM (16GB RAM, SSD) with 10M records in the test table - don't try 10M on the fiddle! Each method is given a factor in terms of how much longer it took than the fastest on the VM.

In all cases, the desired result of bar$foo$john$doe-xxx is obtained for the OP's original data and the test queries (with LIMIT 2 show that they are behaving as expected - i.e. replacing the last dollar ($) sign with a hyphen (-). You may vary this limit on the fiddle to check.

1: Postgresql string functions (see the manual), uses OVERLAY(), STRPOS() AND REVERSE() (individual fiddle):

SELECT
 t_field, 
 OVERLAY(t_field PLACING '-' 
 FROM 
 LENGTH(t_field) + 1 - STRPOS(REVERSE(t_field), '$')
 ) AS result
FROM test;
  • Peformance: time for 10M records = 8034.787 ms.
  • Comparison with fastest = 1.0 x

2: Reverse() and the regex function REGEXP_REPLACE() (individual fiddle):

SELECT 
 REVERSE(REGEXP_REPLACE(REVERSE(t_field), '\$', '-'))
FROM
 test;

What is being done (from the inside out) is:

  • REVERSE() the string,

  • run REGEXP_REPLACE('xxx', '\$', '-') on the reversed string.

    Note that this will only replace the first instance of $ because the 'g' (global) flag isn't present - if the code read ... , '-', 'g'), then all of the dollars would be replaced - and you can do that anyway with the (much cheaper) REPLACE() function.

    Note also that $ is a regex meta-character - i.e. it has special functionality within regular expressions (it means the last character of a string) and therefore it has to be escaped with the backslash (\) character when replacing it.

  • then, the final step is to reverse our edited string back to its original order and we have the result!

It is worth bearing in mind that regular expressions are incredibly powerful. Unfortunately (paraphrasing), with great power comes great complexity. Regexes can become convoluted and difficult to understand very easily - but they are well worth dipping into - they can turn pages of code into one-liners in the hands of an adept!

It is always worthwhile trying to find a different solution with the non-regex functionality first (c.f. solution 1), but they have their place and in this case, it works reasonably well! The site linked to above is a good place to start exploring them.

  • Peformance: time for 10M records = 14298.643 ms.
  • Comparison with fastest = 1.77 x

3: Alternative regex with REGEXP_REPLACE() (doesn't use REVERSE() - see Evan Carroll's answer (individual fiddle)):

SELECT
 t_field,
 REGEXP_REPLACE(t_field, '(.*)\$', '1円-' )
FROM test
LIMIT 2;
  • Peformance: time for 10M records = 16316.768 ms.

  • Comparison with fastest = 2.03 x

4: Alternative string function only, uses SUBSTRING(), POSITION() and LENGTH() (individual fiddle):

SELECT
 t_field,
 REVERSE(
 SUBSTRING(REVERSE(t_field) FROM 1 FOR POSITION('$' IN REVERSE(t_field)) - 1)
 || '-' ||
 SUBSTRING(REVERSE(t_field) FROM POSITION('$' IN REVERSE(t_field)) + 1 FOR (LENGTH(REVERSE(t_field)))))
 FROM test
LIMIT 2;
  • Peformance: time for 10M records = 16316.768 ms.
  • Comparison with fastest = 2.34 x

5: ARRAY (manual) - v. slow but demonstrates STRING_TO_ARRAY(), UNNEST() and WITH ORDINALITY 1 (individual fiddle)

1: See these posts (1, 2 & 3) by Erwin Brandstetter on WITH ORDINALITY

The individual fiddle shows a number of approaches together with performance analysis and some discusson. Included for completeness only and not, in this case, as a realistic choice.

Despite, in this particular case, the ARRAY technique is not very performant (by virtue of having subqueries), much of the backend code of the server uses ARRAYs and they can frequently be the optimum method to tackle various problems. It is well worth getting to know this little-known corner of PostgreSQL.

The first thing is to do this:

SELECT
 UNNEST
 (STRING_TO_ARRAY(REVERSE((SELECT t.t_field 
 FROM test t
 WHERE t.id = 1
 )), '$'));

Result (The OP's record - note xxx comes first because of the REVERSE()):

str
xxx
eod
nhoj
oof
rab

The string is split into fields by the $ character.

Then:

SELECT
 t.t_field,
 t.id, x.elem, x.num
FROM test t
LEFT JOIN LATERAL
 UNNEST(STRING_TO_ARRAY(REVERSE((SELECT t_field 
 FROM test
 WHERE test.id = t.id
 )), '$'))
 WITH ORDINALITY AS x (elem, num) ON TRUE
 LIMIT 5;

Result:

 t_field id elem num
bar$foo$john$doe$xxx 1 xxx 1
bar$foo$john$doe$xxx 1 eod 2
bar$foo$john$doe$xxx 1 nhoj 3
bar$foo$john$doe$xxx 1 oof 4
bar$foo$john$doe$xxx 1 rab 5

The reason that we need the WITH ORDINALITY is that without is, we can't distinguish between the 1st element of the string (i.e. the one that interests us) and the others (elem, num);

Then, we do this:

SELECT
 (SELECT t_field FROM test WHERE test.id = tab.id),
 REVERSE(
 (STRING_TO_ARRAY((SELECT REVERSE(t_field) FROM test WHERE test.id = tab.id), '$'))[1]
 || '-' || 
 STRING_AGG(elem, '$'))
FROM
(
 SELECT
 t.id, x.elem, x.num
 FROM test t
 LEFT JOIN LATERAL
 UNNEST(STRING_TO_ARRAY(REVERSE((SELECT t_field 
 FROM test
 WHERE test.id = t.id
 )), '$'))
 WITH ORDINALITY AS x (elem, num) ON TRUE
) AS tab
WHERE tab.num > 1
GROUP BY tab.id
LIMIT 2;

Result:

 t_field result
bar$foo$john$doe$xxx bar$foo$john$doe-xxx
7a29f$d06f20ドルe21ドルf1ドルb1 7a29f$d06f20ドルe21ドルf-1b1 -- will vary by fiddle run!
result
bar$foo$john$doe-xxx

What this is doing is to aggregate the reversed string back to its original form using the $ as the separator, but EXCLUDING the first element (WHERE num > 1;). In place of the first element is the first element - the array reference [1] + the hyphen (|| '-' ||) and so, we have xxx- plus the other elements of the reversed string with $ separating them.

We then just simply apply REVERSE() to the whole construct to give the desired result!

  • Peformance: time for 10M records = 80715.198 ms.
  • Comparison with fastest = 10.04 x

There is a solution possible that doesn't use WITH ORDINALITY (ROW_NUMBER() instead) - see discussion in the individual fiddle.

Performance

The performance figures on 10M records on a home VM are shown with each query - the db<>fiddle (30,000 records) results mirror them fairly closely in terms of relative magnitudes.

So, in this scenario, use string based methods if possible, but regular expressions can help reduce the SLOC count, however they can be slower - it's up to the DBA/Dev to make that choice between speed and complexity.

answered Nov 29, 2021 at 11:50
0
8

Using regexp_replace

You can do this with a regex and a capture paren using regexp_replace

SELECT regexp_replace(t.x, '(.*)\$', '1円-' )
FROM ( VALUES ('bar$foo$john$doe$xxx') ) AS t(x);

Will replace the last $ with -. The end result is,

bar$foo$john$doe-xxx

Here is how it works,

  • captures everything before the last $ into 1円 and saves it.
  • grabs but does not capture the last $
  • leaves everything else...

Then it restores the capture with 1円 and adds a - leaving the rest of the string in tact.

answered Nov 29, 2021 at 17:24
6
  • Hi Evan - I've been looking at regexes recently and I thought I was getting on fine - until I stumbled onto your gem here! :-) Why does the regex '(.*)\$' pick up the last dollar sign and not, say, the first (or the second...). I mean, '.*' picks up zero or more (arbitrary) chars in a group - so why does this, in particular, extract the last $? Unless there's some special knowledge of the PostgreSQL source code, I'm not following the logic here. Commented May 2, 2024 at 14:38
  • Something like `'(.*)\$[^\$]+$' makes more sense - i.e. capture characters, then a $ literal and then then everthing till then end of the line, except if that contains a ,ドル i.e. it will skip all $'s that aren't the final one? Make sense? Or am I completely bamboozled? I mean, if you look here, it doesn't appear to be easy? Commented May 2, 2024 at 14:39
  • Final comment! :-) Check this out - I'm getting there... It's quite complex - no wonder regexes are such a propeller-head's wet-dream! Commented May 2, 2024 at 16:42
  • @Vérace because .* is greedy on a backtracking regex. if you want not-greedy you do .*? Commented May 2, 2024 at 16:57
  • This keeps happening - I get so far, I think I've understood stuff and then, wallop, another side of regexes emerges that I hadn't grokked! This '(.*)*?\$' will replace the first $ sign with a - in the regex. Are all regexes backtracking in a sense? Nope, that's not it. I'll read/code/struggle on and eventually something will stick - thaks for your input! Commented May 2, 2024 at 17:07
3

Yet another Option for you.
Admittedly, this one may or may not be relevant to your situation, but I'll mention it anyway, if only for completeness ...

Properly Normalise your Data to have only one value per field.
That way, changing any one value becomes Childs-Play. It's just a regular update statement.

Caveat: If you never query your table using any part of this field's value, then this approach is [probably] overkill.

answered Nov 30, 2021 at 9:51

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.