Why does casting this result of REGEXP_SUBSTR()
to a DECIMAL fail?
SELECT
REGEXP_SUBSTR('Cost (-14ドル.18)', '(?<=Cost [(]-[$])[0-9.]+') AS _extracted,
CAST(REGEXP_SUBSTR('Cost (-14ドル.18)', '(?<=Cost [(]-[$])[0-9.]+') AS DECIMAL(8,2)) AS cost_1,
CAST((SELECT _extracted) AS DECIMAL(8,2)) AS cost_2,
CAST((SELECT _extracted) * 1 AS DECIMAL(8,2)) AS cost_3,
CAST('14.18' AS DECIMAL(8,2)) AS cost_4;
+------------+--------+--------+--------+--------+
| _extracted | cost_1 | cost_2 | cost_3 | cost_4 |
+------------+--------+--------+--------+--------+
| 14.18 | 14.00 | 14.00 | 14.18 | 14.18 |
+------------+--------+--------+--------+--------+
Casting a plain string as in cost_4
seems to work. Multiplying the REGEXP_SUBSTR()
result by 1
also appears to work. But simply casting the result as I've done with cost_1
and cost_2
fails to produce the correct fixed point version of _extracted
.
Oddly, in my application using the backreference as I've done for cost_2
actually produces the correct result. Was unable to reproduce elsewhere but thought it worth mentioning.
2 Answers 2
This has been a long-standing issue with MySQL with people reporting this very issue as a bug since 2011. I have found that the problem is almost completely dependent on the collation being used within the REGEXP_SUBSTR()
function.
For instance, if you cast the result of REGEXP_SUBSTR()
as a CHAR(100)
, your decimals remain intact:
mysql> SELECT CAST(CAST(REGEXP_SUBSTR('Cost (-14ドル.18)', '[0-9.]+') AS CHAR(100)) AS DECIMAL(8,2)) AS result;
result
-----
14.18
The result returned by REGEXP_SUBSTR()
used a UTF-16 character set before MySQL 8.0.17. Versions after this supposedly use the same character set as configured by the client (See bug #94203 reported by Rick James), but this does not appear accurate. My SQL client is configured to use UTF-8 everywhere. Running your initial query in my client produces the exact same results as you shared in the question.
However, if I CONVERT( ... USING 'UTF8')
:
SELECT CAST(CONVERT(REGEXP_SUBSTR('Cost (-14ドル.18)', '[0-9.]+') USING 'UTF8') AS DECIMAL(8,2)) AS result;
result
-----
14.18
Surprise, surprise. A correct number.
Generally in this situation I do the very same thing that you did for cost_3
; I multiply the returned value by 1, then cast it to the desired type. You can save a step by casting as FLOAT
, but this sometimes has precision implications.
It is not a great answer, but it is one that can be used across multiple versions of MySQL.
-
That's about as thorough an explanation as I could have hoped for. Presumably this also sheds light on why my backreference may have worked in some cases but still scratching my head about that one. And yes, I was encountering float precision issues which is exactly what led me to this point. Thanks!You Old Fool– You Old Fool2021年05月19日 09:41:59 +00:00Commented May 19, 2021 at 9:41
Not CAST
. Use
FORMAT(expression, 2) -- for displaying with 2 decimal places
ROUND(expression, 2) -- for further computation
-
but, why does this fix the issue?2021年05月19日 20:07:29 +00:00Commented May 19, 2021 at 20:07
-
FORMAT(x, 2)
returns float, not a fixed point number. This leads to precision issues and results like18.18 + 9.69 = 27.869999999999997
which is why I'm usingCAST()
to begin with.You Old Fool– You Old Fool2021年05月19日 20:33:04 +00:00Commented May 19, 2021 at 20:33 -
@billynoah - I understood our Question to be talking about display, not storage or further computation.Rick James– Rick James2021年05月20日 05:45:47 +00:00Commented May 20, 2021 at 5:45
-
In that case I've already accomplished the display aspect in step one where
REGEXP_SUBSTR()
is used to extract the number from the string. You can infer that decimal computation is required by the fact that I'm casting as a decimal.You Old Fool– You Old Fool2021年05月20日 11:39:21 +00:00Commented May 20, 2021 at 11:39