1

I am trying to write a regex that can replace '[', ']' and '].' with '_' and in cases where ']' is the last character it should be replaced with '' but I am struggling to come up with a regex for it that works in all cases because Snowflake does not support lookahead & lookbehind.

My question is, has anyone tried/achieved to do this before? or is it impossible and I should just give up??

Eg:

  • look_behind[0] --> look_behind_0
  • look_behind[0]_positive[1] --> look_behind_0_positive_1
nbk
8,6996 gold badges15 silver badges27 bronze badges
asked Apr 14, 2024 at 23:20
2
  • 1
    Can you show us your attempts Commented Apr 15, 2024 at 5:51
  • Hi, and welcome to dba.se! You have in your two record sample, the datum look_behind[0]_positive[1] which you want converted to look_behind_0_positive_1. You only have one underscore after the 0 - should that be two underscores or should it be only one? This changes the question greatly! Commented Apr 15, 2024 at 22:15

1 Answer 1

1

As far as I can see, you have no need to use a regex at all!

Using a search engine, I found that the functions LEFT() (manual for all string functions), RIGHT() and TRANSLATE() have the same signatures (mostly) in Snowflake (1, 2 & 3) as in PostgreSQL. All of the code below is available on the fiddle here.

Populate with your sample data with extra records:

INSERT INTO tab VALUES 
('look_behind[0]'),
('look_behind[0]_positive[1]'),
('No_[brackets]_[at]_end'), -- two added records for testing
('[]][][_sderfrx÷=/_[]ddf]');

Then, we do this:

--
-- Works where the string <underscore><left-bracket> (_[) OR the string 
-- <right-bracket><underscore> (]_) are to be replaced by two underscores.
--
-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- It is worth noting that the order of the replacements is important as
-- the operations are not commutative - i.e. x R y <> y R x (not necessarily!).
--
--
SELECT
 str,
 LENGTH(str) AS len,
 TRANSLATE
 (
 CASE
 WHEN (RIGHT(str, 1) = ']') THEN RIGHT(str, -1)
 ELSE str
 END,
 '[]',
 '__' -- two underscores - one for each bracket (left & right)
 ) AS f_str,
 REGEXP_COUNT
 (
 str,
 '[\[|\]]',
 1, -- default anyway - not required
 'i' -- not relevant for non-alpha characters
 ) AS f_str_chars_cnt
FROM
 tab;

Result:

str len f_str f_str_chars_cnt
look_behind[0] 14 look_behind_0 2
look_behind[0]_positive[1] 26 look_behind_0__positive_1 4
No_[brackets]_[at]_end 22 No__brackets___at__end 4
[]][][_sderfrx÷=/_[]ddf] 24 _______sderfrx÷=/___ddf 9

You could also do this:

--
-- Works where the string <underscore><left-bracket> (_[) OR the string 
-- <right-bracket><underscore> (]_) are to be replaced by one underscore. 
--
-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- It is worth noting that the order of the replacements is important as
-- the operations are not commutative - i.e. x R y <> y R x (not necessarily!).
--
--
SELECT
 str,
 TRANSLATE
 (
 CASE
 WHEN
 RIGHT (REPLACE(REPLACE(str, '_[', '_'), ']_', '_'), 1) = ']' THEN 
 LEFT (REPLACE(REPLACE(str, '_[', '_'), ']_', '_'), -1)
 ELSE
 str
 END,
 '[]',
 '__'
 )
FROM 
 tab;

Result (the reader can count the underscores!):

str translate
look_behind[0] look_behind_0
look_behind[0]_positive[1] look_behind_0_positive_1
No_[brackets]_[at]_end No__brackets___at__end
[]][][_sderfrx÷=/_[]ddf] _______sderfrx÷=/__ddf

Regexes are expensive relative to simple string functions. If you really want a regular expression, you can do the following (REGEXP_REPLACE() also has a similar signature in PostgreSQL as in Snowflake):

-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- Works where the string <underscore><left-bracket> (_[) AND the string 
-- <right-bracket><underscore> (]_) are to be replaced by two underscores (__).
--
-- The order of the replacements for the desired outcome in terms of what character/
-- string is replaced first is left up to the reader.
-- 
SELECT
 REGEXP_REPLACE
 (
 REGEXP_REPLACE
 (
 str, 
 '(.*)]$', -- replace with pattern 
 '1円'
 ),
 '[\[\]]',
 '_', 
 'g' -- "global" flag to replace all occurrences
 ) 
FROM
 tab;

Result: same as first (TRANSLATE()) snippet above.

Performance analysis (see end of fiddle):

The entire results of the fiddle aren't included here, but just to see the times is interesting.

TRANSLATE() - Execution Time: 0.037 ms
REGEXP_REPLACE() - Execution Time: 0.100 ms
REPLACE(...REPLACE() - Execution Time: 0.027 ms

The time taken for the two string function solutions is roughly 33% of the time taken for the regular expression solution. This is why it's always worth checking to see if an ordinary string function will suffice in cases such as this.

answered Apr 15, 2024 at 11:25

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.