2

I am trying to import a lot of data into a MariaDB table. I generated a large SQL script file which contains a lot of special characters (mainly μ, °, and NBSP).

I would like to run my script to put the data into SQL and preserve these characters. Each time I try to run the script, I either get "Incorrect string value" errors, or the characters appear as "?" when I query the data (See below). Is there a way that I can get this script to work and import these special characters?

I'll provide an example sql script at the bottom and explain how I have tried to run it.

I tried running my script in the Linux terminal, mysql -e "source Test.sql". This stopped after the first error, so I'll use examples from the mysql terminal since it will show all of the errors I encountered in my small example script.

Notice how I first tried to run my script file in the SQL terminal. I got an "Incorrect string value" for each of the characters mentioned above (\xB0 = °, \xB5 = μ, \xA0 = NBSP). Running the script doesn't work, however, when I copy/paste the script into my mysql session, it runs just fine. This gives me the desired output, but this would be cumbersome to do in my full-scale script.

Any idea how to get this to work by running the script?

#### Trying to run my script from SQL terminal
MariaDB [atlas]> source Test.sql
Query OK, 0 rows affected (0.017 sec)
Query OK, 0 rows affected (0.027 sec)
ERROR 1366 (22007) at line 12 in file: 'Test.sql': Incorrect string value: '\xB0)' for column `atlas`.`TestSQL`.`comment` at row 1
ERROR 1366 (22007) at line 14 in file: 'Test.sql': Incorrect string value: '\xB5)' for column `atlas`.`TestSQL`.`comment` at row 1
ERROR 1366 (22007) at line 16 in file: 'Test.sql': Incorrect string value: '\xA0' for column `atlas`.`TestSQL`.`comment` at row 1
#### Copying the lines from the script into SQL works...
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
 -> VALUES ('2019-03-04 13:11:35','This one contains a deg symbol (°)');
Query OK, 1 row affected (0.005 sec)
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
 -> VALUES ('2020-02-04 13:48:04','This one has a mu (μ)');
Query OK, 1 row affected (0.004 sec)
MariaDB [atlas]> INSERT INTO TestSQL (timestamp,comment)
 -> VALUES ('2022-06-12 16:07:50','This one has a line break >
 '> And it contains a NBSP > ');
Query OK, 1 row affected (0.005 sec)

This is what I want the output to look like after I run the script

MariaDB [atlas]> select * from TestSQL;
+---------------------+--------------------------------------------------------+
| timestamp | comment |
+---------------------+--------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (°) |
| 2020年02月04日 13:48:04 | This one has a mu (μ) |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP > |
+---------------------+--------------------------------------------------------+
3 rows in set (0.000 sec)
MariaDB [atlas]>

Here is my SQL script TestSQL.sql to duplicate this issue.

DROP TABLE IF EXISTS TestSQL;
--SET CHARACTER SET utf8mb4;
CREATE TABLE IF NOT EXISTS TestSQL (
 timestamp timestamp DEFAULT CURRENT_TIMESTAMP,
 comment varchar(256) COLLATE utf8mb4_unicode_ci,
 PRIMARY KEY (timestamp)
) CHARACTER SET 'utf8mb4';
INSERT INTO TestSQL (timestamp,comment)
 VALUES ('2019-03-04 13:11:35','This one contains a deg symbol (°)');
INSERT INTO TestSQL (timestamp,comment)
 VALUES ('2020-02-04 13:48:04','This one has a mu (μ)');
INSERT INTO TestSQL (timestamp,comment)
 VALUES ('2022-06-12 16:07:50','This one has a line break >
And it contains a NBSP > ');

Other things I've tried:

  1. In the script above, I used the line "SET CHARACTER SET utf8mb4;". When the script is run with this uncommented, it seems to work, however it has question marks when I query the data:
MariaDB [atlas]> select * from TestSQL;
+---------------------+-------------------------------------------------------+
| timestamp | comment |
+---------------------+-------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (?) |
| 2020年02月04日 13:48:04 | This one has a mu (?) |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP >? |
+---------------------+-------------------------------------------------------+
3 rows in set (0.000 sec)

I get this same result when I use the command in the SQL terminal before running my script.

  1. I've tried messing around with the /etc/my.cnf file. I didn't really see any changes here. I tried restarting the mariadb server just in case that had an effect too. Here is my config file for completion:
[client]
database=atlas
default-character-set=utf8mb4
[mysqld]
bind-address=0.0.0.0
max_allowed_packet=64M
collation-server = utf8mb4_unicode_ci
character-set-client=utf8mb4
character-set-server=utf8mb4
init-connect = 'SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci'
[client-server]

Here is the output of the questions asked in a similar post (How to fix "Incorrect string value" errors?):

MariaDB [atlas]> show variables like '%colla%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8mb4_general_ci |
+----------------------+--------------------+
3 rows in set (0.001 sec)
MariaDB [atlas]> show variables like '%charac%';
+--------------------------+------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mariadb/charsets/ |
+--------------------------+------------------------------+
8 rows in set (0.001 sec)

Edit: Adding this information based on comments.

MariaDB [atlas]> select @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4 | utf8mb4_unicode_ci |
+--------------------------+----------------------+
1 row in set (0.000 sec)

Answer to ysth's comment:

MariaDB [atlas]> select *,char_length(comment),hex(comment) from TestSQL;
+---------------------+-------------------------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------+
| timestamp | comment | char_length(comment) | hex(comment) |
+---------------------+-------------------------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (▒) | 34 | 54686973206F6E6520636F6E7461696E732061206465672073796D626F6C2028B029 |
| 2020年02月04日 13:48:04 | This one has a mu (▒) | 21 | 54686973206F6E65206861732061206D752028B529 |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP > | 53 | 54686973206F6E65206861732061206C696E6520627265616B203E0A416E6420697420636F6E7461696E732061204E425350203E20 |
+---------------------+-------------------------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------+
3 rows in set (0.000 sec)

Information that Rick James requested: Note that I did have to add NBSP to the last query. Somehow, I must have removed the NBSP characters by accident!

MariaDB [atlas]> select *,char_length(comment),hex(comment) from TestSQL;
+---------------------+----------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
| timestamp | comment | char_length(comment) | hex(comment) |
+---------------------+----------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (°) | 34 | 54686973206F6E6520636F6E7461696E732061206465672073796D626F6C2028B029 |
| 2020年02月04日 13:48:04 | This one has a mu (μ) | 21 | 54686973206F6E65206861732061206D752028B529 |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP >  | 54 | 54686973206F6E65206861732061206C696E6520627265616B203E0A416E6420697420636F6E7461696E732061204E425350203EA0A0 |
+---------------------+----------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
3 rows in set (0.000 sec)
MariaDB [atlas]> set names latin1;
Query OK, 0 rows affected (0.000 sec)
MariaDB [atlas]> select *,char_length(comment),hex(comment) from TestSQL;
+---------------------+--------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
| timestamp | comment | char_length(comment) | hex(comment) |
+---------------------+--------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
| 2019年03月04日 13:11:35 | This one contains a deg symbol (▒) | 34 | 54686973206F6E6520636F6E7461696E732061206465672073796D626F6C2028B029 |
| 2020年02月04日 13:48:04 | This one has a mu (▒) | 21 | 54686973206F6E65206861732061206D752028B529 |
| 2022年06月12日 16:07:50 | This one has a line break >
And it contains a NBSP >▒▒ | 54 | 54686973206F6E65206861732061206C696E6520627265616B203E0A416E6420697420636F6E7461696E732061204E425350203EA0A0 |
+---------------------+--------------------------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------+
3 rows in set (0.000 sec)
MariaDB [atlas]> show create table;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '' at line 1
MariaDB [atlas]> show create table Saves \G
*************************** 1. row ***************************
 Table: Saves
Create Table: CREATE TABLE `Saves` (
 `expNum` varchar(16) DEFAULT NULL,
 `timestamp` timestamp NOT NULL DEFAULT current_timestamp(),
 `comment` varchar(256) DEFAULT NULL,
 PRIMARY KEY (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.000 sec)
asked May 8, 2024 at 18:03
7
  • @easleyfixed Thanks for the suggestion. I removed the string dates and replaced with CURRENT_TIMESTAMP. I had to also remove the primary key from my test table. After this, I saw the same "incorrect string value" errors as before. Commented May 8, 2024 at 19:17
  • Ok just making sure, I think it can handle them passed in such a way but wanted to test to make sure. So it is for sure that string on the end, it seems to not be terminating properly on these characters. Another thing might be if its possible to set the character_set_database to utf8mb4 so its not converting things that way too. And the collation database matching the others would be a good test. See your table above and you will see what I mean, I wonder if its converting those characters due to those settings. Commented May 8, 2024 at 19:27
  • I modified the database that this table is in to use utf8mb4 for the character set and collation and added the output to my original post. I still see the same issue. Commented May 8, 2024 at 19:37
  • Interesting, i wonder also if its use of a reserved word like timestamp, as it is a datatype so maybe its not working due to that? You said "with this uncommented" you mean the grayed out line at the top of your sql file? The only other thing would be character termination like passing ' requires '' per single ' when passed so it terminates the string at the right character. Commented May 8, 2024 at 19:40
  • this all seems like a client issue to me. with SET CHARACTER SET utf8mb4 uncommented, what does select *,char_length(comments),hex(comments) from TestSQL; show? with it unset, what is the full error message you get? Commented May 8, 2024 at 19:54

1 Answer 1

0

Since the bytes in the client are encode in latin1 (0xB5, etc), you must have use latin1, not utf8mb4 for some of those settings. One approach is:

`SET NAMES latin1;

That changes character_set_client/connection/results. The last of those is probably not relevant to the loading, but only to converting back to latin1 during `SELECT.

answered May 8, 2024 at 20:12
Sign up to request clarification or add additional context in comments.

6 Comments

I removed the collation and character set from my script and changed the database back to latin1. I was able to get the script to run successfully, however, when I query the table, I see "▒" in place of the special characters. I've tried using your suggestion in the script and sql terminal before querying. I also confirmed that client/connection/database/results/server are using latin1. Is there anything I'm missing? Thank you
How are you "seeing" it? On a web page? In an app? More:stackoverflow.com/questions/38363566/…
I'm using an ssh session to a Linux server through Putty. The character set being used is "UTF-8". I did just make an interesting discovery. When I queried this information now I saw what I expected. Its been 16+ hours since the last time I queried it (and got the "▒" to appear). What is interesting is that I can run my script, see the "▒", then I restart my sql connection and run the same select query again, and I get the results that I wanted. My sql session is "auto-commiting", but I tried a commit and other tests to verify this is not a factor.
All in all, I did try this all again with the "SET NAMES latin1;" commented out of my script. I get the incorrect string value errors again. So I'll add a bit to your solution. Run "SET NAMES latin1" first (in script or before script), then run sql script, then run "SET NAMES DEFAULT" BEFORE you query the table. This gives the desired behavior. Thank you for your help!
Use SELECT HEX(col) ... to see what is stored in the database. Provide that, plus SHOW CREATE TABLE.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.