Huh, ok, so I have this massive problem with encodings and I just do not know how to deal with it. After two days of Google searches I think I just run out of options :)
What I want to do is the following.
- Place text in a textbox on a website
- Send the text to the backend (written in Python)
- Use the text to create:
a. An image in PIL.
b. An entry in MySQL.
Now all of this works smoothly when we're talking about regular characters. But when I try to use Korean, Polish, Japanese characters I get very weird looking characters inserted in both the image and the database. In the examples below I'll use a three character string of Polish characters - "ąść".
Here's what I have done after Googling.
Inserted the following in .htaccess:
AddCharset UTF-8 .py .css .js .html
My python file now starts with:
#!/usr/bin/python
# -*- coding: utf-8 -*-
All of my MySQL databases are encoded in "utf8_unicode_ci".
Now, here's an example of what I'm trying to do... Whenever I parse "ąść" (three Polish characters) it gets saved in the database and generated on the image as:
Ä...ść
Now, a few debugging issues. I go directly to Python and assign the following to the variable (value_text1) that usually has its text parsed (so - no text parsing, simply set fixed text to generate the image with and put into the database):
A) If I go with value_text1 = 'ąść' I get ...ść as a result.
B) If I go with value_text1 = u'ąść' I get the following error message:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)
C) If I go with value_text1 = u'ąść'.encode('UTF-8') I get ...ść as a result.
D) If I go with value_text1 = u'\u0105\u015B\u0107'.encode('UTF-8'), where "\u0105\u015B\u0107" is the actual unicode for "ąść" I get ...ść as a result.
Really no clue what I'm doing wrong - server settings, python file settings, wrong command? Will appreciate any thoughts, huge thank you in advance.
1 Answer 1
If I try it in an interactive shell or from a .py file
#!/usr/bin/python
# -*- coding: utf-8 -*-
value_text1 = u'ąść'
print value_text1
it works perfectly well for me, so I guess it's something with your server configuration.
BTW, make sure to use charset="utf-8" when connecting to the server.
3 Comments
Explore related questions
See similar questions with these tags.
u"somestringwith-ąść"