For the past few days I've been learning programing with Python and I'm still but a beginner. Recently, I've used the book 'Code in the cloud' for that purpose. The thing is, while all those textbooks cover a wide area of topics thoroughly they merely touch upon the issue of UTF-8 encoding in languages other than English. Hance my question for you - how to make the following batch of code display utf-8 characters correctly in my mother tongue.
# -*- coding: utf-8 -*-
import datetime
import sys
class ChatError(Exception):
""" Wyjątki obsługujące wszelkiego rodzaju błędy w czacie."""
def __init__(self, msg):
self.message = msg
# START: ChatMessage
class ChatMessage(object):
"""Pojedyncza wiadomość wysłana przez użytkownika czatu"""
def __init__(self, user, text):
self.sender = user
self.msg = text
self.time = datetime.datetime.now()
def __str__(self):
return "Od: %s o godzinie %s: %s" % (self.sender.username,
self.time,
self.msg)
# END: ChatMessage
# START: ChatUser
class ChatUser(object):
"""Użytkownik biorący udział w czacie"""
def __init__(self, username):
self.username = username
self.rooms = {}
def subscribe(self, roomname):
if roomname in ChatRoom.rooms:
room = ChatRoom.rooms[roomname]
self.rooms[roomname] = room
room.addSubscriber(self)
else:
raise ChatError("Nie znaleziono pokoju %s" % roomname)
def sendMessage(self, roomname, text):
if roomname in self.rooms:
room = self.rooms[roomname]
cm = ChatMessage(self, text)
room.addMessage(cm)
else:
raise ChatError("Użytkownik %s nie jest zarejestrowany w pokoju %s" %
(self.username, roomname))
def displayChat(self, roomname, out):
if roomname in self.rooms:
room = self.rooms[roomname]
room.printMessages(out)
else:
raise ChatError("Użytkownik %s nie jest zarejestrowany w pokoju %s" %
(self.username, roomname))
# END: ChatUser
# START: ChatRoom
class ChatRoom(object):
"""A chatroom"""
rooms = {}
def __init__(self, name):
self.name = name
self.users = []
self.messages = []
ChatRoom.rooms[name] = self
def addSubscriber(self, subscriber):
self.users.append(subscriber)
subscriber.sendMessage(self.name, 'Użytkownik %s dołączył do dyskusji.' %
subscriber.username)
def removeSubscriber(self, subscriber):
if subscriber in self.users:
subscriber.sendMessage(self.name,
"Użytkownik %s opóścił pokój." %
subscriber.username)
self.users.remove(subscriber)
def addMessage(self, msg):
self.messages.append(msg)
def printMessages(self, out):
print >>out, "Lista wiadomości: %s" % self.name
for i in self.messages:
print >>out, i
# END: ChatRoom
# START: ChatMain
def main():
room = ChatRoom("Main")
markcc = ChatUser("MarkCC")
markcc.subscribe("Main")
prag = ChatUser("Prag")
prag.subscribe("Main")
markcc.sendMessage("Main", "Hej! Jest tu kto?")
prag.sendMessage("Main", "Tak, ja tu jestem.")
markcc.displayChat("Main", sys.stdout)
if __name__ == "__main__":
main()
# END: ChatMain
It was taken from the aforementioned book, but I cannot make it display non-English characters correctly in the Windows commandline (even though it supports them). As you can see I added encoding statement (# -- coding: utf-8 -) at the beginning thanks to which the code works at all. I also tried using u"string" syntax but to no avail- it returns the following message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017c' in position 5
1: ordinal not in range(128)
What to do to make those characters display correctly? Yes, I will often work with strings formated in UTF. I would be very grateful for your help.
4 Answers 4
Try invoking the Python interpreter this way:
#!/usr/bin/python -S
import sys
sys.setdefaultencoding("utf-8")
import site
This will set the global default encoding to utf-8. The usual default encoding is ASCII. This is used when writing string to some output, such as using built-ins like print.
6 Comments
setdefaultencoding from sys after running site so you have to call reload(sys) immediately after import sys if you want to use it outside site.-S option (don't import site module). Then you call setdefaultencoding, then explicitly import site afterwards. The reason for this is the site module removes the setdefaultencoding method after it is used once (so it can't be changed later).This works for me currently:
#!/usr/bin/env python
# -*-coding=utf-8 -*-
Comments
Okay, I know nothing about python, and little about the windows command-line, but a little Googling and:
I think the problem is that the windows cmd shell doesn't support utf-8. If I'm not wrong, this should give you more understanding about the error:
http://wiki.python.org/moin/PrintFails
(Got that link from this question:' Unicode characters in Windows command line - how?).
It looks like you can force python into thinking it can print UTF8 using PYTHONIOENCODING.
This question is about finding utf8 enabled windows shells:
Is there a Windows command shell that will display Unicode characters?
May be helpful. Hope you solve your problem.
2 Comments
putty handles UTF-8 just fine. It isn't Python's job to display them right. That is the job of your terminal program.The Windows terminal sometimes uses a non-UTF-8 encoding (python: unicode in Windows terminal, encoding used?). You therefore might want to try the following:
stdout_encoding = sys.stdout.encoding
def printMessages(self, out):
print >>out, ("Lista wiadomości: %s" % self.name).decode('utf-8').encode(stdout_encoding)
for i in self.messages:
print >>out, i.decode('utf-8').encode(stdout_encoding)
This takes your byte strings, turns them into character strings (your file indicates that they are encoded in UTF-8), and then encodes them for your terminal.
You can find useful information about the general issue of encoding and decoding on StackOverflow.
print username.decode('utf-8')to tell Python to decode the string to unicode, then it will encode it correctly automatically