I am using MySQL 8 and have a Regex that needs to return a Regex group from a table column of string data.
Table:
CREATE TABLE `table_name`(
id_column VARCHAR(300)
);
INSERT INTO `table_name` VALUES
('47099085T'),
('D73654109H'),
('8.30781719e-05'),
('0113:11:19%.&checktime(2000,54,17,':').042828927Z');
SQL:
SELECT REGEXP_REPLACE(id_column,
'[^\\.0-9]([0-9]{8}[TRWAGMYFPDXBNJZSQVHLCKEtrwagmyfpdxbnjzsqvhlcke])', '1ドル') as id_output
FROM `table_name`
Regex expected output here (regex101.com)
The Regex contains a group, whose REGEX replace is used to return to the SELECT.
> "47099085T" // matches and returns value in quotes.
> D"73654109H" // matches and returns value in quotes.
> 8.30781719e-05 // unmatched
> .&checktime(0113,11,19,':')%.&checktime(2000,54,17,':').042828927Z // unmatched
The MySQL manual states:
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.
Expected outcome:
id_output |
---|
47099085T |
73654109H |
Actual outcome:
id_output |
---|
47099085T |
73654109H |
8.30781719e-05 |
.&checktime(0113,11,19,':')%.&checktime(2000,54,17,':').042828927Z |
So for the two rows that do not match the sub group, I am expecting the return on the REGEX_REPLACE to be NULL, because there isn't any match. But it instead returns the whole column's data.
Why?
How can I edit the REGEXP to only return matching outcomes?
-
Literal dot char does not need in quoting in "not in" list. db-fiddle.com/f/5WAdXxGmwcdCDWH7Bbk85G/4Akina– Akina2023年01月20日 16:18:37 +00:00Commented Jan 20, 2023 at 16:18
-
2If the pattern doesn't match, nothing is replaced. If nothing is replaced, the output is the same as the input. Simple.hobbs– hobbs2023年01月21日 14:04:53 +00:00Commented Jan 21, 2023 at 14:04
2 Answers 2
SELECT id_column,
REGEXP_REPLACE(id_column, '([^\\d.])(\\d{8}[TRWAGMYFPDXBNJZSQVHLCKEtrwagmyfpdxbnjzsqvhlcke])', '2ドル')
FROM `table_name`
WHERE id_column REGEXP '(^|[^\\d.])(\\d{8}[TRWAGMYFPDXBNJZSQVHLCKEtrwagmyfpdxbnjzsqvhlcke])'
-
Ahh that's brilliant, that works as intended. Thank you!Mm-Art-In– Mm-Art-In2023年01月20日 16:21:38 +00:00Commented Jan 20, 2023 at 16:21
-
Great, You corrected the expressionRolandoMySQLDBA– RolandoMySQLDBA2023年01月20日 16:23:19 +00:00Commented Jan 20, 2023 at 16:23
-
4Don't just post code. Explain what you did, so you actually answer the question.Barmar– Barmar2023年01月21日 16:32:00 +00:00Commented Jan 21, 2023 at 16:32
The REPLACE function pretty much does the same thing
Basically, both REPLACE and REGEXP_REPLACE would simply return the original string if there was nothing to replace in the original string.
Just imagine your query written as an UPDATE
UPDATE table_name SET id_column =
REGEXP_REPLACE(id_column,'[^\\.0-9]([0-9]{8}[TRWAGMYFPDXBNJZSQVHLCKEtrwagmyfpdxbnjzsqvhlcke])', '1ドル') ;
That would be quite destructive if you want REGEXP_REPLACE to return nothing. In light of this, it is a very good thing that both REPLACE and REGEXP_REPLACE would return the original string.
SUGGESTION : Use a WHERE Clause
SELECT REGEXP_REPLACE(id_column,
'[^\\.0-9]([0-9]{8}[TRWAGMYFPDXBNJZSQVHLCKEtrwagmyfpdxbnjzsqvhlcke])', '1ドル')
as id_output FROM `table_name` WHERE id_column <>
REGEXP_REPLACE(id_column,
'[^\\.0-9]([0-9]{8}[TRWAGMYFPDXBNJZSQVHLCKEtrwagmyfpdxbnjzsqvhlcke])', '1ドル');
The db-fiddle Query is even better WHERE cluase
SELECT id_column
FROM `table_name`
WHERE id_column REGEXP
'[^\\.0-9][0-9]{8}[TRWAGMYFPDXBNJZSQVHLCKEtrwagmyfpdxbnjzsqvhlcke]'l
What about the MySQL Documentation ???
Your citation of the MySQL Documentation points to the parameters of REGEXP_REPLACE being NULL. It makes no mention of the return value of REGEXP_REPLACE.
UPDATE 2023年01月20日 10:50 EST
REGEXP_REPLACE should not return the same value in any instance. The only thing I can think of is if you are running an older version of MySQL 8.0. There is a character set bug mentioned on that documentation page:
Prior to MySQL 8.0.17, the result returned by this function used the UTF-16 character set; in MySQL 8.0.17 and later, the character set and collation of the expression searched for matches is used. (Bug #94203, Bug #29308212)
You might need to experiment with REGEXP_LIKE in the WHERE clause and see (I am not a fan of REGEXP stuff).
-
I tried the WHERE clause but the problem is some times the column value is an exact match for the Regex groupMm-Art-In– Mm-Art-In2023年01月20日 15:41:42 +00:00Commented Jan 20, 2023 at 15:41
-
Thanks for the update; my query comes from the DB Fiddle, rather than usage on my own servers -- the DB fiddle only says it uses MySQL 8.0 but doesn't specify exact versioning.Mm-Art-In– Mm-Art-In2023年01月20日 16:06:17 +00:00Commented Jan 20, 2023 at 16:06
-
I updated the db-fiddle with some hash of return values. Please have a lookRolandoMySQLDBA– RolandoMySQLDBA2023年01月20日 16:18:20 +00:00Commented Jan 20, 2023 at 16:18