homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: strftime month name is encoded somehow
Type: Stage:
Components: Unicode Versions: Python 2.4
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: BreamoreBoy, belopolsky, kevinwatters, lemburg, loewis, tim_evans
Priority: normal Keywords:

Created on 2003年11月04日 20:49 by tim_evans, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (10)
msg18893 - (view) Author: Tim Evans (tim_evans) Date: 2003年11月04日 20:49
On Windows XP, with some locales the month name
returned by time.strftime('%B') is encoded somehow. 
For example:
>>> import time, locale
>>> locale.setlocale(locale.LC_ALL, '')
"Chinese_People's Republic of China.936"
>>> time.strftime('%B')
'\xca\xae\xd2\xbb\xd4\xc2'
>>> time.strftime('%d %B %Y')
'05 \xca\xae\xd2\xbb\xd4\xc2 2003'
>>> locale.setlocale(locale.LC_ALL, '')
'French_France.1252'
>>> time.strftime('%B', (2003,12,1,0,0,0,0,0,0))
'd\xe9cembre'
I'm not sure what encoding the Chinese version is
using, but the French is compatible with latin-1. It
would appear that the encoding used is locale-dependent.
Ideally, the win32 version of time.strftime would call
the wide-character version of strftime (called
wcsftime) and return a unicode object.
I haven't looked at what this does under any other
operating system.
msg18894 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003年11月05日 20:28
Logged In: YES 
user_id=21627
It always contains a byte string in the locale's encoding;
for compatibility, this cannot be changed.
On Windows, you can access the encoding as "mbcs". In
general, you need to use locale.getpreferredencoding() to
find out what encoding this string is in.
Closing as not-a-bug.
msg18895 - (view) Author: Tim Evans (tim_evans) Date: 2003年11月05日 22:45
Logged In: YES 
user_id=561705
I'm reopening the bug, because that doesn't seem to work:
>>> import time, locale
>>> locale.setlocale(locale.LC_ALL, '')
"Chinese_People's Republic of China.936"
>>> x = time.strftime('%B')
>>> x
'\xca\xae\xd2\xbb\xd4\xc2'
>>> x.decode('mbcs')
'\xca\xae\xd2\xbb\xd4\xc2'
>>> locale.getpreferredencoding()
'cp1252'
>>> x.decode('cp1252')
'\xca\xae\xd2\xbb\xd4\xc2'
The preferred encoding is returned as cp1252, which can't be
correct. And niether cp1252 nor mbcs appear to decode the
string into anything containing the high-numbered characters
I would expect for chinese (neither of them changes the
string at all).
The following problems (may) exist:
1. locale.getpreferredencoding() doesn't work.
2. The string return by time.strftime() is not mbcs encoded.
3. The documentation for time.strftime() doesn't say how
the string is encoded.
msg18896 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2003年11月06日 08:53
Logged In: YES 
user_id=38388
Tim, there's nothing much we can do about this since the
strftime()
API is a direct interface to the underlying C lib API. Python
simply passes through the arguments to this function and
returns whatever teh C lib has to offer.
Please refer to the C lib documentation for your platform
for details about the encoding being used for the strings.
BTW, a simpe table with the month names in your application
should nicely solve your problem; addtitionally it gives you
full control ove the encoding and wording being used.
msg18897 - (view) Author: Tim Evans (tim_evans) Date: 2003年11月06日 21:00
Logged In: YES 
user_id=561705
The windows C lib docs say that calling mbstowcs on the
output of strftime (or calling wcsftime instead of strftime)
will return the correct wide-character (utf-16?) string. 
This produces something that looks like it could be correct.
 Decoding with the 'mbcs' encoding in Python is not
equivalent to calling mbstowcs because mbstowcs is
locale-dependent.
Perhaps it would be a good idea to have time.strftime return
a unicode string. As this wouldn't be backward compatible,
it could be done via a new function time.ustrftime, or via
an optional unicode=True argument to the existing function.
msg18898 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003年11月06日 21:33
Logged In: YES 
user_id=21627
Is there any way to find out the encoding that mbstowcs uses?
msg18899 - (view) Author: Tim Evans (tim_evans) Date: 2003年11月06日 22:21
Logged In: YES 
user_id=561705
I have looked at the source code for the MS C library (it
comes with VC++6) and I believe that that something
equivalent to the following is used:
char codepage[16];
GetLocaleInfo(
 GetThreadLocale(),
 LOCALE_IDEFAULTANSICODEPAGE,
 codepage, 16);
This returns "1252" for "C" locale, and for the chinese
locale that I was expirmenting with it returns "936". 
Python does not have an encoding "cp936", but from C the
conversion with an explicit codepage produces the same
results as mbstwcs.
msg18900 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003年11月07日 18:56
Logged In: YES 
user_id=21627
This tells me that we need a function to return the current
locale's code page; this should return "cp936" in your case.
The fact that Python does not have a codec for cp936 is an
independent issue.
msg107343 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年06月08日 20:29
Is this still an issue in 3.x? With time.strftime() returning unicode, I don't think any encoding issues remain.
msg114295 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年08月18日 23:07
Closed as no reply to msg107343.
History
Date User Action Args
2022年04月11日 14:56:00adminsetgithub: 39503
2010年08月18日 23:07:20BreamoreBoysetstatus: open -> closed

nosy: + BreamoreBoy
messages: + msg114295

resolution: out of date
2010年06月08日 20:29:12belopolskysetassignee: belopolsky

messages: + msg107343
nosy: + belopolsky
2008年08月29日 23:56:38kevinwatterssetnosy: + kevinwatters
2003年11月04日 20:49:39tim_evanscreate

AltStyle によって変換されたページ (->オリジナル) /