I got strange error message when tried to save first_name, last_name to Django's auth_user model.
Failed examples
user = User.object.create_user(username, email, password)
user.first_name = u'Rytis'
user.last_name = u'Slatkevičius'
user.save()
>>> Incorrect string value: '\xC4\x8Dius' for column 'last_name' at row 104
user.first_name = u'Валерий'
user.last_name = u'Богданов'
user.save()
>>> Incorrect string value: '\xD0\x92\xD0\xB0\xD0\xBB...' for column 'first_name' at row 104
user.first_name = u'Krzysztof'
user.last_name = u'Szukiełojć'
user.save()
>>> Incorrect string value: '\xC5\x82oj\xC4\x87' for column 'last_name' at row 104
Succeed examples
user.first_name = u'Marcin'
user.last_name = u'Król'
user.save()
>>> SUCCEED
MySQL settings
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
Table charset and collation
Table auth_user has utf-8 charset with utf8_general_ci collation.
Results of UPDATE command
It didn't raise any error when updating above values to auth_user table by using UPDATE command.
mysql> update auth_user set last_name='Slatkevičiusa' where id=1;
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select last_name from auth_user where id=100;
+---------------+
| last_name |
+---------------+
| Slatkevi?iusa |
+---------------+
1 row in set (0.00 sec)
PostgreSQL
The failed values listed above can be updated into PostgreSQL table when I switched the database backend in Django. It's strange.
mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
...
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
...
But from http://www.postgresql.org/docs/8.1/interactive/multibyte.html, I found the following:
Name Bytes/Char
UTF8 1-4
Is it means unicode char has maxlen of 4 bytes in PostgreSQL but 3 bytes in MySQL which caused above error?
-
2It's a MySQL problem, not Django: stackoverflow.com/questions/1168036/…Vanuan– Vanuan2014年05月21日 20:05:43 +00:00Commented May 21, 2014 at 20:05
9 Answers 9
None of these answers solved the problem for me. The root cause being:
You cannot store 4-byte characters in MySQL with the utf-8 character set.
MySQL has a 3 byte limit on utf-8 characters (yes, it's wack, nicely summed up by a Django developer here)
To solve this you need to:
- Change your MySQL database, table and columns to use the utf8mb4 character set (only available from MySQL 5.5 onwards)
- Specify the charset in your Django settings file as below:
settings.py
DATABASES = {
'default': {
'ENGINE':'django.db.backends.mysql',
...
'OPTIONS': {'charset': 'utf8mb4'},
}
}
Note: When recreating your database you may run into the 'Specified key was too long' issue.
The most likely cause is a CharField which has a max_length of 255 and some kind of index on it (e.g. unique). Because utf8mb4 uses 33% more space than utf-8 you'll need to make these fields 33% smaller.
In this case, change the max_length from 255 to 191.
Alternatively you can edit your MySQL configuration to remove this restriction but not without some django hackery
UPDATE: I just ran into this issue again and ended up switching to PostgreSQL because I was unable to reduce my VARCHAR to 191 characters.
7 Comments
'charset': 'utf8mb4' option in Django settings is critical, as @Xerion said. Finally, the index problem is a mess. Remove the index on the column, or make its length no more than 191, or use a TextField instead!mysql.connector.django as the database backend, you must also set 'collation': 'utf8mb4_unicode_ci' in OPTIONS.I had the same problem and resolved it by changing the character set of the column. Even though your database has a default character set of utf-8 I think it's possible for database columns to have a different character set in MySQL. Here's the SQL QUERY I used:
ALTER TABLE database.table MODIFY COLUMN col VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
3 Comments
If you have this problem here's a python script to change all the columns of your mysql database automatically.
#! /usr/bin/env python
import MySQLdb
host = "localhost"
passwd = "passwd"
user = "youruser"
dbname = "yourdbname"
db = MySQLdb.connect(host=host, user=user, passwd=passwd, db=dbname)
cursor = db.cursor()
cursor.execute("ALTER DATABASE `%s` CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci'" % dbname)
sql = "SELECT DISTINCT(table_name) FROM information_schema.columns WHERE table_schema = '%s'" % dbname
cursor.execute(sql)
results = cursor.fetchall()
for row in results:
sql = "ALTER TABLE `%s` convert to character set DEFAULT COLLATE DEFAULT" % (row[0])
cursor.execute(sql)
db.close()
7 Comments
db.commit() before db.close().If it's a new project, I'd just drop the database, and create a new one with a proper charset:
CREATE DATABASE <dbname> CHARACTER SET utf8;
4 Comments
- --character-set-server=utf8--character-set-server=utf8mb4I just figured out one method to avoid above errors.
Save to database
user.first_name = u'Rytis'.encode('unicode_escape')
user.last_name = u'Slatkevičius'.encode('unicode_escape')
user.save()
>>> SUCCEED
print user.last_name
>>> Slatkevi\u010dius
print user.last_name.decode('unicode_escape')
>>> Slatkevičius
Is this the only method to save strings like that into a MySQL table and decode it before rendering to templates for display?
6 Comments
.encode('unicode_escape') you're not actually storing unicode characters in the database. You're forcing all the clients to unencode before using them, which means it won't work properly with django.admin or all sorts of other things.utf8 character set.utf8mb4 that allows more than the Basic Multilingual Plane to be stored. I know, you'd think "UTF8" is all that's needed to store Unicode fully. Well, whaddaya know, it's not. See dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html You can change the collation of your text field to UTF8_general_ci and the problem will be solved.
Notice, this cannot be done in Django.
Comments
Improvement to @madprops answer - solution as a django management command:
import MySQLdb
from django.conf import settings
from django.core.management.base import BaseCommand
class Command(BaseCommand):
def handle(self, *args, **options):
host = settings.DATABASES['default']['HOST']
password = settings.DATABASES['default']['PASSWORD']
user = settings.DATABASES['default']['USER']
dbname = settings.DATABASES['default']['NAME']
db = MySQLdb.connect(host=host, user=user, passwd=password, db=dbname)
cursor = db.cursor()
cursor.execute("ALTER DATABASE `%s` CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci'" % dbname)
sql = "SELECT DISTINCT(table_name) FROM information_schema.columns WHERE table_schema = '%s'" % dbname
cursor.execute(sql)
results = cursor.fetchall()
for row in results:
print(f'Changing table "{row[0]}"...')
sql = "ALTER TABLE `%s` convert to character set DEFAULT COLLATE DEFAULT" % (row[0])
cursor.execute(sql)
db.close()
Hope this helps anybody but me :)
1 Comment
sql = "ALTER TABLE `%s` convert to character set DEFAULT COLLATE DEFAULT" % (row[0]) should be changed to sql = "ALTER TABLE `%s` convert to character set 'utf8' COLLATE 'utf8_unicode_ci' " % (row[0]) . Thank you for very best answer.You aren't trying to save unicode strings, you're trying to save bytestrings in the UTF-8 encoding. Make them actual unicode string literals:
user.last_name = u'Slatkevičius'
or (when you don't have string literals) decode them using the utf-8 encoding:
user.last_name = lastname.decode('utf-8')
1 Comment
Simply alter your table, no need to any thing. just run this query on database.
ALTER TABLE table_nameCONVERT TO CHARACTER SET utf8
it will definately work.