I've just started to learn Python but I already ran into troubles.
I have a simple script with just one command:
#!/usr/bin/env python3
print("Příliš žluťoučký kůň úpěl ďábelské ódy.") # Text in Czech
When I try to run this script:
python3 hello.py
I get this message:
Traceback (most recent call last):
File "hello.py", line 2, in <module>
print("P\u0159\xedli\u0161 \u017elu\u0165ou\u010dk\xfd k\u016fn \xfap\u011bl \u010f\xe1belsk\xe9 \xf3dy.")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
I am using Kubuntu 16.04 and Python 3.5.2.
When I tried this: export PYTHONIOENCODING=utf-8 It worked but only temporarily. Next time I opened bash I got the same error.
According to https://docs.python.org/3/howto/unicode.html#the-string-type
the default encoding for Python source code is UTF-8.
So I have the source file saved id UTF-8, Konsole is set to UTF-8 but I still get the error!
Even if I add
# -*- coding: utf-8 -*-
to the beginning it does nothing.
Another weird thing: when I run it using only python, not python3, it works. How is it possible to work in Python 2.7.12 and not in 3.5.2?
Any ideas for solving this permanently? Thank you.
1 Answer 1
Thanks to Mark Tolen and Alastair McCormack for suggesting where the problem may be. The problem was really in the locale settings.
When I ran locale, the output was:
LANG=C
LANGUAGE=
LC_CTYPE="C"
LC_NUMERIC=cs_CZ.UTF-8
LC_TIME=cs_CZ.UTF-8
LC_COLLATE=cs_CZ.UTF-8
LC_MONETARY=cs_CZ.UTF-8
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT=cs_CZ.UTF-8
LC_IDENTIFICATION="C"
LC_ALL=
This "C" is the default setting which uses the ANSI charmap. And that is where the problem was. Running locale charmap gave me: ANSI_X3.4-1968 which can not display non-English characters.
I fixed this using this Ubuntu documentation site.
I added these lines to /etc/default/locale:
LANGUAGE=cs_CZ.UTF-8
LC_ALL=cs_CZ.UTF-8
Then you have to restart your session (log out and in) to apply these settings.
Running locale now returns this output:
LANG=C
LANGUAGE=cs
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER="cs_CZ.UTF-8"
LC_NAME="cs_CZ.UTF-8"
LC_ADDRESS="cs_CZ.UTF-8"
LC_TELEPHONE="cs_CZ.UTF-8"
LC_MEASUREMENT="cs_CZ.UTF-8"
LC_IDENTIFICATION="cs_CZ.UTF-8"
LC_ALL=cs_CZ.UTF-8
and running locale charmap returns:
UTF-8
asciiwhen printing Unicode.LANG=cs_CZ.UTF-8but you've not built/installed the Czech locale? Python will default to ASCII encoding if your locale is broken or missing. The reason it works in Python 2 is because the string is a byte string and will simply be written directly to your terminal. Python 3 will need to encode strings when writing to the terminalLANGwas set toCwhich is the default setting that uses ANSI. Only few LC_*** were set tocs_CZ.UTF-8and the other ones inherited theCfromLANG. I added these lines to/etc/default/locale/:LANG=cs_CZ.UTF-8 LANGUAGE=cs_CZ.UTF-8 LC_ALL=cs_CZ.UTF-8It works! Now why I am writing this as a comment and not as an answer. The output tolocalenow iscs_CZ.UTF-8everywhere except forLANG. Why can't I set this variable?LANGin/etc/default/locale. Only configure the things likeLANGUAGEif you want a specific exception, like having English error messages. Once set and you've restarted your session, then eachLC_should be the same. Check that LANG isn't being set in/etc/environmentor your personal shell files. See help.ubuntu.com/community/Locale