Python and unicode

Sun Sep 19 15:43:28 EDT 2010

Hi everybody.
I've played for few hours with encoding in py, but it's still somewhat
confusing to me. So I've written a test file (encoded as utf-8). I've
put everything I think is true in comment at the beginning of script.
Could you check if it's correct (on side note, script does what I
intended it to do).
One more thing, is there some mechanism to avoid writing all the time
'something'.decode('utf-8')? Some sort of function call to tell py
interpreter that id like to do implicit decoding with specified
encoding for all string constants in script?
Here's my script:
-------------------
# vim: set encoding=utf-8 :
"""
 ----- encoding and py -----
 - 1st (or 2nd) line tells py interpreter encoding of file
 - if this line is missing, interpreter assumes 'ascii'
 - it's possible to use variations of first line
 - the first or second line must match the regular expression
"coding[:=]\s*([-\w.]+)" (PEP-0263)
 - some variations:
 '''
 # coding=<encoding name>
 '''
 '''
 #!/usr/bin/python
 # -*- coding: <encoding name> -*-
 '''
 '''
 #!/usr/bin/python
 # vim: set fileencoding=<encoding name> :
 '''
 - this version works for my vim:
 '''
 # vim: set encoding=utf-8 :
 '''
 - constants can be given via str.decode() method or via unicode
constructor
 - if locale is used, it shouldn't be set to 'LC_ALL' as it changes
encoding
"""
import datetime, locale
#locale.setlocale(locale.LC_ALL,'croatian') # changes encoding
locale.setlocale(locale.LC_TIME,'croatian') # sets correct date
format, but encoding is left alone
print 'default locale:', locale.getdefaultlocale()
s='abcdef ČčĆćĐđŠšŽž'.decode('utf-8')
ss=unicode('ab ČćŠđŽ','utf-8')
# date part of string is decoded as cp1250, because it's default
locale
all=datetime.date(2000,1,6).strftime("'%d.%m.%Y.', %x, %A, %B,
").decode('cp1250')+'%s, %s' % (s, ss)
print all
-------------------