Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Required fields*

Unicode problems in Python

Huh, ok, so I have this massive problem with encodings and I just do not know how to deal with it. After two days of Google searches I think I just run out of options :)

What I want to do is the following.

  1. Place text in a textbox on a website
  2. Send the text to the backend (written in Python)
  3. Use the text to create:
    a. An image in PIL.
    b. An entry in MySQL.

Now all of this works smoothly when we're talking about regular characters. But when I try to use Korean, Polish, Japanese characters I get very weird looking characters inserted in both the image and the database. In the examples below I'll use a three character string of Polish characters - "ąść".

Here's what I have done after Googling.

Inserted the following in .htaccess:

AddCharset UTF-8 .py .css .js .html

My python file now starts with:

#!/usr/bin/python
# -*- coding: utf-8 -*-

All of my MySQL databases are encoded in "utf8_unicode_ci".

Now, here's an example of what I'm trying to do... Whenever I parse "ąść" (three Polish characters) it gets saved in the database and generated on the image as:

Ä...ść

Now, a few debugging issues. I go directly to Python and assign the following to the variable (value_text1) that usually has its text parsed (so - no text parsing, simply set fixed text to generate the image with and put into the database):

A) If I go with value_text1 = 'ąść' I get ...ść as a result.

B) If I go with value_text1 = u'ąść' I get the following error message:

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

C) If I go with value_text1 = u'ąść'.encode('UTF-8') I get ...ść as a result.

D) If I go with value_text1 = u'\u0105\u015B\u0107'.encode('UTF-8'), where "\u0105\u015B\u0107" is the actual unicode for "ąść" I get ...ść as a result.

Really no clue what I'm doing wrong - server settings, python file settings, wrong command? Will appreciate any thoughts, huge thank you in advance.

Answer*

Draft saved
Draft discarded
Cancel
3
  • That might be it... do you know where I could try to search for this configuration? Would that be an Apache thing or .htaccess or something else? Commented Mar 30, 2013 at 20:33
  • Assuming you are using the MySQLdb module, just add charset="utf-8" to your MySQLdb.connect() call :) Commented Mar 30, 2013 at 20:36
  • Ha nice one! This actually worked, thank you! Now need to figure out the PIL issue and I'm set... I'm definitely a step closer now :) Commented Mar 30, 2013 at 23:26

default

AltStyle によって変換されたページ (->オリジナル) /