I am trying to write a regex that can replace '[', ']' and '].' with '_' and in cases where ']' is the last character it should be replaced with '' but I am struggling to come up with a regex for it that works in all cases because Snowflake does not support lookahead & lookbehind.
My question is, has anyone tried/achieved to do this before? or is it impossible and I should just give up??
Eg:
look_behind[0] --> look_behind_0
look_behind[0]_positive[1] --> look_behind_0_positive_1
1 Answer 1
As far as I can see, you have no need to use a regex at all!
Using a search engine, I found that the functions LEFT()
(manual for all string functions), RIGHT()
and TRANSLATE()
have the same signatures (mostly) in Snowflake (1
, 2
& 3
) as in PostgreSQL. All of the code below is available on the fiddle here.
Populate with your sample data with extra records:
INSERT INTO tab VALUES
('look_behind[0]'),
('look_behind[0]_positive[1]'),
('No_[brackets]_[at]_end'), -- two added records for testing
('[]][][_sderfrx÷=/_[]ddf]');
Then, we do this:
--
-- Works where the string <underscore><left-bracket> (_[) OR the string
-- <right-bracket><underscore> (]_) are to be replaced by two underscores.
--
-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- It is worth noting that the order of the replacements is important as
-- the operations are not commutative - i.e. x R y <> y R x (not necessarily!).
--
--
SELECT
str,
LENGTH(str) AS len,
TRANSLATE
(
CASE
WHEN (RIGHT(str, 1) = ']') THEN RIGHT(str, -1)
ELSE str
END,
'[]',
'__' -- two underscores - one for each bracket (left & right)
) AS f_str,
REGEXP_COUNT
(
str,
'[\[|\]]',
1, -- default anyway - not required
'i' -- not relevant for non-alpha characters
) AS f_str_chars_cnt
FROM
tab;
Result:
str len f_str f_str_chars_cnt
look_behind[0] 14 look_behind_0 2
look_behind[0]_positive[1] 26 look_behind_0__positive_1 4
No_[brackets]_[at]_end 22 No__brackets___at__end 4
[]][][_sderfrx÷=/_[]ddf] 24 _______sderfrx÷=/___ddf 9
You could also do this:
--
-- Works where the string <underscore><left-bracket> (_[) OR the string
-- <right-bracket><underscore> (]_) are to be replaced by one underscore.
--
-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- It is worth noting that the order of the replacements is important as
-- the operations are not commutative - i.e. x R y <> y R x (not necessarily!).
--
--
SELECT
str,
TRANSLATE
(
CASE
WHEN
RIGHT (REPLACE(REPLACE(str, '_[', '_'), ']_', '_'), 1) = ']' THEN
LEFT (REPLACE(REPLACE(str, '_[', '_'), ']_', '_'), -1)
ELSE
str
END,
'[]',
'__'
)
FROM
tab;
Result (the reader can count the underscores!):
str translate
look_behind[0] look_behind_0
look_behind[0]_positive[1] look_behind_0_positive_1
No_[brackets]_[at]_end No__brackets___at__end
[]][][_sderfrx÷=/_[]ddf] _______sderfrx÷=/__ddf
Regexes are expensive relative to simple string functions. If you really want a regular expression, you can do the following (REGEXP_REPLACE()
also has a similar signature in PostgreSQL as in Snowflake):
-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- Works where the string <underscore><left-bracket> (_[) AND the string
-- <right-bracket><underscore> (]_) are to be replaced by two underscores (__).
--
-- The order of the replacements for the desired outcome in terms of what character/
-- string is replaced first is left up to the reader.
--
SELECT
REGEXP_REPLACE
(
REGEXP_REPLACE
(
str,
'(.*)]$', -- replace with pattern
'1円'
),
'[\[\]]',
'_',
'g' -- "global" flag to replace all occurrences
)
FROM
tab;
Result: same as first (TRANSLATE()
) snippet above.
Performance analysis (see end of fiddle):
The entire results of the fiddle aren't included here, but just to see the times is interesting.
TRANSLATE() - Execution Time: 0.037 ms
REGEXP_REPLACE() - Execution Time: 0.100 ms
REPLACE(...REPLACE() - Execution Time: 0.027 ms
The time taken for the two string function solutions is roughly 33% of the time taken for the regular expression solution. This is why it's always worth checking to see if an ordinary string function will suffice in cases such as this.
Explore related questions
See similar questions with these tags.
look_behind[0]_positive[1]
which you want converted tolook_behind_0_positive_1
. You only have one underscore after the0
- should that be two underscores or should it be only one? This changes the question greatly!