I have the following MySQL 8.0 stored function:
DELIMITER $$
CREATE FUNCTION maybe_utf8_decode(str text charset utf8mb4, rowid INT)
RETURNS text CHARSET utf8mb4 DETERMINISTIC
BEGIN
declare str_converted text charset utf8mb4;
declare max_error_count int default @@max_error_count;
set @@max_error_count = 0;
set @@max_error_count = max_error_count;
set str_converted = convert(cast(convert(str using latin1) as binary) using utf8mb4);
if @@warning_count > 0 then
INSERT INTO a_warnings (id) VALUES (rowid);
return str;
else
return str_converted;
end if;
END$$
DELIMITER ;
It gets executed like this:
UPDATE products SET description2 = maybe_utf8_decode(description,id);
I got this function from here: https://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/ . The purpose is to fix our database by correcting "double encoded" product descriptions. The problem is that a plain CONVERT by itself truncates many of my product descriptions when it encounters an Error or a Warning. The function above isn't working because it's not recognizing Warnings/Errors when convert(cast(convert(str using latin1) as binary) using utf8mb4)
is attempted. My plan was if a Warning/Error appeared, skip the conversion script and save the product id instead, so I can review them manually. Otherwise if there are no Errors/Warnings, I would like the function to convert the string.
What would be the proper way to write this function, so it skips strings if Errors/Warnings appear and store that rowid in a separate table?
Example of warning that I want to skip. When I run:
convert(cast(convert(description2 using latin1) as binary) using utf8mb4)
I'll occasionally get an warning like below, followed by my string getting truncated:
Warning: #1300 Invalid utf8mb4 character string: 'A04C69'
I would like my function to rollback that conversion and return the original string when a Warning like that happens (to prevent my data from being truncated). So the problem is how do I know a warning like this happens in my function? How do I get @@warning_count to work the way I intend?
2 Answers 2
Use user-defined variable.
CREATE FUNCTION maybe_utf8_decode(str text charset utf8mb4, rowid INT)
RETURNS text CHARSET utf8mb4 DETERMINISTIC
BEGIN
-- ...
if @@warning_count > 0 then
INSERT INTO a_warnings (id) VALUES (rowid);
SET @warnings_amount := @warnings_amount + 1; -- increase UDV
return str;
else
return str_converted;
end if;
END
and then
UPDATE products
JOIN (@warnings_amount := 0) reset_UDV -- init/clear UDV value
SET description2 = maybe_utf8_decode(description,id);
SELECT @warnings_amount; -- look for warnings amount collected in UDV
-
I think you misunderstood the question. The problem is that I'm having trouble getting the
@@warning_count
in the if/then statement. It seems that the function is unable to tell when a Warning/Error happens, so it just skips to theelse
part by default. I know Warnings exist, but@@warning_count
is not recognizing them for some reason.peppy– peppy2023年05月23日 00:13:50 +00:00Commented May 23, 2023 at 0:13
I had basically the same problem with the same function: the check against @@warning_count
just wasn't reliable. I don't know if it was a race condition or what, but that was my best guess.
I changed the function to use a handler for the 1300 error code instead and that resolved the issue:
DELIMITER $$
CREATE FUNCTION maybe_utf8_decode(str longtext charset utf8mb4)
RETURNS longtext charset utf8mb4 DETERMINISTIC
BEGIN
declare str_converted longtext charset utf8mb4;
declare exit handler for 1300 return str;
set str_converted = convert(binary convert(str using latin1) using utf8mb4);
return str_converted;
END$$
DELIMITER ;
This doesn't have the "rowid" portion and saving of problem rows into another table that your original does, but that could be worked into the exit handler also.