This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年06月05日 20:17 by pitrou, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| utf_8_16.patch | vstinner, 2010年07月24日 03:41 | |||
| Messages (8) | |||
|---|---|---|---|
| msg88972 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年06月05日 20:17 | |
The behaviour of several incremental encoders is inconsistent between
2.x and py3k.
In 2.x:
>>> enc = codecs.getincrementalencoder('utf-16')()
>>> enc.getstate()
0
>>> enc.setstate(0)
>>> enc.encode(u'abc')
'\xff\xfea\x00b\x00c\x00'
In py3k:
>>> enc = codecs.getincrementalencoder('utf-16')()
>>> enc.getstate()
2
>>> enc.setstate(0)
>>> enc.encode('abc')
b'a\x00b\x00c\x00'
|
|||
| msg89073 - (view) | Author: Walter Dörwald (doerwalter) * (Python committer) | Date: 2009年06月08日 11:13 | |
This was done because the codec state is part of the return value of tell(). To have a reasonable return value (i.e. one with just the position itself) in as many cases as possible it makes sense to design the codec state in such a way, that the most common state is 0. This is what was done for py3k: The default state (no BOM read/written yet) is 2 not 0. |
|||
| msg89074 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年06月08日 11:19 | |
Yes, I agree with py3k's behaviour. But it should be backported to 2.x as well. I don't know where the changes must be done so if someone else could do it it would be nice :-) (I'm backporting the py3k IO lib and I had to disable two tests because of this) |
|||
| msg89075 - (view) | Author: Walter Dörwald (doerwalter) * (Python committer) | Date: 2009年06月08日 11:59 | |
AFAICR the difference is: 2.x may return any object in getstate(), but py3k must return a (buffered input, integer) tuple. Simply moving py3ks getstate/setstate implementation over to 2.x might do the trick. |
|||
| msg111423 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年07月24日 03:41 | |
Codecs are inconsistents: utf-32 has working getstate() / setstate() methods, whereas utf-8-sig and utf-16 don't (getstate() always return 0, setstate() does nothing). > Simply moving py3ks getstate/setstate implementation > over to 2.x might do the trick. That's what my patch does :-) It just a copy/paste of Python3 code. It does fix #5006 tests (which are re-enabled by the patch). Using the patch, it's possible to save/restore utf-8-sig and utf-16 codecs state. |
|||
| msg111745 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年07月27日 22:47 | |
The patch looks ok to me (I suppose you have tested it). |
|||
| msg111760 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年07月28日 01:45 | |
> The patch looks ok to me Ok, commited to 2.7 (r83198). > (I suppose you have tested it) I ran test_io which does test the incremental encoders. -- I'm not brave enough to commit it to 2.6 (test_io in 2.6 doesn't use incremental encoders). |
|||
| msg111762 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年07月28日 01:59 | |
> I'm not brave enough to commit it to 2.6 > (test_io in 2.6 doesn't use incremental encoders) Oh, I just remembered that I choosed to fix this issue to be able to backport #5006 to 2.6 :-) So r83199 is the incremental encoder fix for 2.6, and r83200 is the BOM fix for the io library. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:49 | admin | set | github: 50462 |
| 2010年07月28日 01:59:06 | vstinner | set | messages: + msg111762 |
| 2010年07月28日 01:45:22 | vstinner | set | status: open -> closed resolution: fixed messages: + msg111760 |
| 2010年07月27日 22:47:30 | pitrou | set | messages:
+ msg111745 versions: - Python 3.2 |
| 2010年07月24日 03:41:56 | vstinner | set | files:
+ utf_8_16.patch nosy: + vstinner messages: + msg111423 keywords: + patch |
| 2009年06月08日 11:59:30 | doerwalter | set | messages: + msg89075 |
| 2009年06月08日 11:19:07 | pitrou | set | messages: + msg89074 |
| 2009年06月08日 11:13:55 | doerwalter | set | nosy:
+ doerwalter messages: + msg89073 |
| 2009年06月05日 20:17:41 | pitrou | create | |