1

Huh, ok, so I have this massive problem with encodings and I just do not know how to deal with it. After two days of Google searches I think I just run out of options :)

What I want to do is the following.

  1. Place text in a textbox on a website
  2. Send the text to the backend (written in Python)
  3. Use the text to create:
    a. An image in PIL.
    b. An entry in MySQL.

Now all of this works smoothly when we're talking about regular characters. But when I try to use Korean, Polish, Japanese characters I get very weird looking characters inserted in both the image and the database. In the examples below I'll use a three character string of Polish characters - "ąść".

Here's what I have done after Googling.

Inserted the following in .htaccess:

AddCharset UTF-8 .py .css .js .html

My python file now starts with:

#!/usr/bin/python
# -*- coding: utf-8 -*-

All of my MySQL databases are encoded in "utf8_unicode_ci".

Now, here's an example of what I'm trying to do... Whenever I parse "ąść" (three Polish characters) it gets saved in the database and generated on the image as:

Ä...ść

Now, a few debugging issues. I go directly to Python and assign the following to the variable (value_text1) that usually has its text parsed (so - no text parsing, simply set fixed text to generate the image with and put into the database):

A) If I go with value_text1 = 'ąść' I get ...ść as a result.

B) If I go with value_text1 = u'ąść' I get the following error message:

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

C) If I go with value_text1 = u'ąść'.encode('UTF-8') I get ...ść as a result.

D) If I go with value_text1 = u'\u0105\u015B\u0107'.encode('UTF-8'), where "\u0105\u015B\u0107" is the actual unicode for "ąść" I get ...ść as a result.

Really no clue what I'm doing wrong - server settings, python file settings, wrong command? Will appreciate any thoughts, huge thank you in advance.

Mark Tolonen
181k26 gold badges184 silver badges279 bronze badges
asked Mar 30, 2013 at 19:38
7
  • 1
    How are you rendering your text in PIL? Commented Mar 30, 2013 at 20:01
  • Where do you get that error message? What is the code that raises that error message? Commented Mar 30, 2013 at 20:15
  • maybe your editor isn't saving non-ascii char in utf-8. Commented Mar 30, 2013 at 20:25
  • @BrenBarn: I think you always get that error when trying to do u"somestringwith-ąść" Commented Mar 30, 2013 at 20:44
  • @jazzpi: Not if you have the encodings set properly (unless you try to print it or something). Commented Mar 30, 2013 at 20:48

1 Answer 1

1

If I try it in an interactive shell or from a .py file

#!/usr/bin/python
# -*- coding: utf-8 -*-
value_text1 = u'ąść'
print value_text1

it works perfectly well for me, so I guess it's something with your server configuration.

BTW, make sure to use charset="utf-8" when connecting to the server.

answered Mar 30, 2013 at 19:54
Sign up to request clarification or add additional context in comments.

3 Comments

That might be it... do you know where I could try to search for this configuration? Would that be an Apache thing or .htaccess or something else?
Assuming you are using the MySQLdb module, just add charset="utf-8" to your MySQLdb.connect() call :)
Ha nice one! This actually worked, thank you! Now need to figure out the PIL issue and I'm set... I'm definitely a step closer now :)

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.