I am trying to import a lot of data into a MariaDB table. I generated a large SQL script file which contains a lot of special characters (mainly μ, °, and NBSP).
I would like to run my script to put the data into SQL and preserve these characters. Each time I try to run the script, I either get "Incorrect string value" errors, or the characters appear as "?" when I query the data (See below). Is there a way that I can get this script to work and import these special characters?
I'll provide an example sql script at the bottom and explain how I have tried to run it.
I tried running my script in the Linux terminal, mysql -e "source Test.sql". This stopped after the first error, so I'll use examples from the mysql terminal since it will show all of the errors I encountered in my small example script.
Notice how I first tried to run my script file in the SQL terminal. I got an "Incorrect string value" for each of the characters mentioned above (\xB0 = °, \xB5 = μ, \xA0 = NBSP). Running the script doesn't work, however, when I copy/paste the script into my mysql session, it runs just fine. This gives me the desired output, but this would be cumbersome to do in my full-scale script.
Any idea how to get this to work by running the script?
#### Trying to run my script from SQL terminal
MariaDB [atlas]> source Test.sql
Query OK, 0 rows affected (0.017 sec)
Query OK, 0 rows affected (0.027 sec)
ERROR 1366 (22007) at line 12 in file: 'Test.sql': Incorrect string value: '\xB0)' for column `atlas`.`TestSQL`.`comment` at row 1
ERROR 1366 (22007) at line 14 in file: 'Test.sql': Incorrect string value: '\xB5)' for column `atlas`.`TestSQL`.`comment` at row 1
ERROR 1366 (22007) at line 16 in file: 'Test.sql': Incorrect string value: '\xA0' for column `atlas`.`TestSQL`.`comment` at row 1
#### Copying the lines from the script into SQL works...
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
-> VALUES ('2019-03-04 13:11:35','This one contains a deg symbol (°)');
Query OK, 1 row affected (0.005 sec)
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
-> VALUES ('2020-02-04 13:48:04','This one has a mu (μ)');
Query OK, 1 row affected (0.004 sec)
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
-> VALUES ('2022-06-12 16:07:50','This one has a line break >
'> And it contains a NBSP > ');
Query OK, 1 row affected (0.005 sec)
This is what I want the output to look like after I run the script
MariaDB [atlas]> select * from TestSQL;
+---------------------+--------------------------------------------------------+
| timestamp | comment |
+---------------------+--------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (°) |
| 2020年02月04日 13:48:04 | This one has a mu (μ) |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP > |
+---------------------+--------------------------------------------------------+
3 rows in set (0.000 sec)
MariaDB [atlas]>
Here is my SQL script TestSQL.sql to duplicate this issue.
DROP TABLE IF EXISTS TestSQL;
--SET CHARACTER SET utf8mb4;
CREATE TABLE IF NOT EXISTS TestSQL (
timestamp timestamp DEFAULT CURRENT_TIMESTAMP,
comment varchar(256) COLLATE utf8mb4_unicode_ci,
PRIMARY KEY (timestamp)
) CHARACTER SET 'utf8mb4';
INSERT INTO TestSQL (timestamp,comment)
VALUES ('2019-03-04 13:11:35','This one contains a deg symbol (°)');
INSERT INTO TestSQL (timestamp,comment)
VALUES ('2020-02-04 13:48:04','This one has a mu (μ)');
INSERT INTO TestSQL (timestamp,comment)
VALUES ('2022-06-12 16:07:50','This one has a line break >
And it contains a NBSP > ');
Other things I've tried:
- In the script above, I used the line "SET CHARACTER SET utf8mb4;". When the script is run with this uncommented, it seems to work, however it has question marks when I query the data:
MariaDB [atlas]> select * from TestSQL;
+---------------------+-------------------------------------------------------+
| timestamp | comment |
+---------------------+-------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (?) |
| 2020年02月04日 13:48:04 | This one has a mu (?) |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP >? |
+---------------------+-------------------------------------------------------+
3 rows in set (0.000 sec)
I get this same result when I use the command in the SQL terminal before running my script.
- I've tried messing around with the
/etc/my.cnffile. I didn't really see any changes here. I tried restarting the mariadb server just in case that had an effect too. Here is my config file for completion:
[client]
database=atlas
default-character-set=utf8mb4
[mysqld]
bind-address=0.0.0.0
max_allowed_packet=64M
collation-server = utf8mb4_unicode_ci
character-set-client=utf8mb4
character-set-server=utf8mb4
init-connect = 'SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci'
[client-server]
Here is the output of the questions asked in a similar post (How to fix "Incorrect string value" errors?):
MariaDB [atlas]> show variables like '%colla%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8mb4_general_ci |
+----------------------+--------------------+
3 rows in set (0.001 sec)
MariaDB [atlas]> show variables like '%charac%';
+--------------------------+------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mariadb/charsets/ |
+--------------------------+------------------------------+
8 rows in set (0.001 sec)
Edit: Adding this information based on comments.
MariaDB [atlas]> select @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4 | utf8mb4_unicode_ci |
+--------------------------+----------------------+
1 row in set (0.000 sec)
Answer to ysth's comment:
MariaDB [atlas]> select *,char_length(comment),hex(comment) from TestSQL;
+---------------------+-------------------------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------+
| timestamp | comment | char_length(comment) | hex(comment) |
+---------------------+-------------------------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (▒) | 34 | 54686973206F6E6520636F6E7461696E732061206465672073796D626F6C2028B029 |
| 2020年02月04日 13:48:04 | This one has a mu (▒) | 21 | 54686973206F6E65206861732061206D752028B529 |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP > | 53 | 54686973206F6E65206861732061206C696E6520627265616B203E0A416E6420697420636F6E7461696E732061204E425350203E20 |
+---------------------+-------------------------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------+
3 rows in set (0.000 sec)
Information that Rick James requested: Note that I did have to add NBSP to the last query. Somehow, I must have removed the NBSP characters by accident!
MariaDB [atlas]> select *,char_length(comment),hex(comment) from TestSQL;
+---------------------+----------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
| timestamp | comment | char_length(comment) | hex(comment) |
+---------------------+----------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (°) | 34 | 54686973206F6E6520636F6E7461696E732061206465672073796D626F6C2028B029 |
| 2020年02月04日 13:48:04 | This one has a mu (μ) | 21 | 54686973206F6E65206861732061206D752028B529 |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP > | 54 | 54686973206F6E65206861732061206C696E6520627265616B203E0A416E6420697420636F6E7461696E732061204E425350203EA0A0 |
+---------------------+----------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
3 rows in set (0.000 sec)
MariaDB [atlas]> set names latin1;
Query OK, 0 rows affected (0.000 sec)
MariaDB [atlas]> select *,char_length(comment),hex(comment) from TestSQL;
+---------------------+--------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
| timestamp | comment | char_length(comment) | hex(comment) |
+---------------------+--------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (▒) | 34 | 54686973206F6E6520636F6E7461696E732061206465672073796D626F6C2028B029 |
| 2020年02月04日 13:48:04 | This one has a mu (▒) | 21 | 54686973206F6E65206861732061206D752028B529 |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP >▒▒ | 54 | 54686973206F6E65206861732061206C696E6520627265616B203E0A416E6420697420636F6E7461696E732061204E425350203EA0A0 |
+---------------------+--------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
3 rows in set (0.000 sec)
MariaDB [atlas]> show create table;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '' at line 1
MariaDB [atlas]> show create table Saves \G
*************************** 1. row ***************************
Table: Saves
Create Table: CREATE TABLE `Saves` (
`expNum` varchar(16) DEFAULT NULL,
`timestamp` timestamp NOT NULL DEFAULT current_timestamp(),
`comment` varchar(256) DEFAULT NULL,
PRIMARY KEY (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.000 sec)
1 Answer 1
Since the bytes in the client are encode in latin1 (0xB5, etc), you must have use latin1, not utf8mb4 for some of those settings. One approach is:
`SET NAMES latin1;
That changes character_set_client/connection/results. The last of those is probably not relevant to the loading, but only to converting back to latin1 during `SELECT.
6 Comments
SELECT HEX(col) ... to see what is stored in the database. Provide that, plus SHOW CREATE TABLE.
select *,char_length(comments),hex(comments) from TestSQL;show? with it unset, what is the full error message you get?