4

I am trying to insert some Arabic word into the arabic_word column of my hanswehr2 database Maria DB using the MySQLdb driver.

I was getting a latin-1 encode error. But after reading around, I found out that the MySQLdb driver was defaulted to latin-1 and I had to explicitly set utf-8 as my charset of choice at the mariadb.connect() function. Sauce.

The entire database is set to utf-8.

Code:

def insert_into_db(arabic_word, definition):
 try:
 conn = mariadb.connect('localhost', 'root', 'xyz1234passwd', 'hans_wehr', charset='utf-8', use_unicode=True)
 conn.autocommit(True)
 cur = conn.cursor()
 cur.execute("INSERT INTO hanswehr2 (arabic_word , definition) VALUES (%s,%s)", (arabic_word, definition,))
 except mariadb.Error, e:
 print e
 sys.exit(1)

However now I get the following error:

/usr/bin/python2.7 /home/heisenberg/hans_wehr/main.py
Total lines 87672
(2019, "Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/)")
Process finished with exit code 1

I have specified the Python MySQL driver to use the utf-8 character however it seems to ignore this.

Any inputs would be highly appreciated.

Brian C.
8,2105 gold badges42 silver badges50 bronze badges
asked Jun 8, 2016 at 17:58
5
  • How is it a possible duplicate? The question you referenced is in PHP Commented Jun 8, 2016 at 18:15
  • oopps, sorry about that. But you should really try to call 'utf8 seems that can help. check here: stackoverflow.com/a/6203782/4421474 Commented Jun 8, 2016 at 18:17
  • 1
    utf-8 (with a hyphen) is not a valid character set name. Use utf8 Commented Jun 9, 2016 at 13:07
  • @AlastairMcCormack Could you write your utf8 comment as an answer inorder to help future SO users. Thank you. Commented Jun 10, 2016 at 6:34
  • @KenOkech Many thanks Commented Jun 10, 2016 at 8:44

2 Answers 2

12

The charset alias for UTF-8 in MySQL is utf8 (no hyphen).

See https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html for available charsets.

Note, if you need to use non-BMP Unicode points, such as emojis, use utf8mb4 for the connection charset and the varchar type.

answered Jun 10, 2016 at 8:43
Sign up to request clarification or add additional context in comments.

Comments

-1

There is a thing called collations that helps encode/decode characters for specific languages. https://softwareengineering.stackexchange.com/questions/95048/what-is-the-difference-between-collation-and-character-set

I think u need to specify it when creating your database table or in the connection string. refer this: store arabic in SQL database

More on python mysql connection : https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnection-set-charset-collation.html

answered Jun 8, 2016 at 18:23

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.