This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年01月19日 23:24 by vstinner, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| append_bom-2.patch | vstinner, 2009年03月26日 17:52 | |||
| append_bom-3.patch | pitrou, 2009年03月28日 18:53 | |||
| append_bom-4.patch | pitrou, 2009年05月10日 22:52 | |||
| Messages (19) | |||
|---|---|---|---|
| msg80221 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年01月19日 23:24 | |
Copy/paste of message79330 from the issue #4862: -------------- >>> f = open('utf16.txt', 'w', encoding='utf-16') >>> f.write('abc') 3 >>> f.close() >>> f = open('utf16.txt', 'a', encoding='utf-16') >>> f.write('def') 3 >>> f.close() >>> open('utf16.txt', 'r', encoding='utf-16').read() 'abc\ufeffdef' -------------- |
|||
| msg80223 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年01月19日 23:37 | |
Bug is reproductible with: * Python 2.5 : charset utf-8-sig and utf-16 for codecs.open() * trunk : charset utf-8-sig, utf-16 and utf-32 for codecs.open() * py3k : charset utf-8-sig, utf-16 and utf-32 for open() With utf-7 or utf-8, no BOM is written. Note: with UTF-32, the BOM is 4 bytes long (0xff 0xfe 0x00 0x00 on little endian) but it's still the character (BOM) \ufeff (little endian). |
|||
| msg80303 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年01月21日 00:41 | |
See also issues #5008 (f.tell()) and #5016 (f.seekable()), not directly related to this issue. |
|||
| msg83554 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年03月14日 00:49 | |
One possible solution would be to call tell() in the TextIOWrapper constructor, and then decoder.setstate((b"", 0)) if the current pos is >0. |
|||
| msg83834 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月20日 00:37 | |
@pitrou: You're right, but the state have to be changed for the encoder, not the decoder. I added the following code to TextIOWrapper constructor (for the C and the Python version of io library): if self._seekable and self.writable(): position = self.buffer.tell() if position != 0: self._encoder = self._get_encoder() self._encoder.setstate(0) |
|||
| msg84185 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年03月26日 16:55 | |
As for the C version of the patch: - I'm not sure why you make self->ok handling more complicated than it was - encodefunc should not be forced to NULL, otherwise it will yield a big decrease in write() performance |
|||
| msg84188 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月26日 17:26 | |
> - I'm not sure why you make self->ok handling more complicated than it was tell() requires self->ok=1. I choosed to reset self->ok to zero on error, but it's maybe useless. > - encodefunc should not be forced to NULL, otherwise it will yield a big > decrease in write() performance self>encodefunc value have to be changed because <encoder>.setstate() changes the encoding function: UTF-16 => UTF-16-LE or UTF-16-BE. I don't know how to get the new value of self>encodefunc. I will try to write another patch. |
|||
| msg84193 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年03月26日 17:41 | |
Le jeudi 26 mars 2009 à 17:26 +0000, STINNER Victor a écrit : > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > > - I'm not sure why you make self->ok handling more complicated than it was > > tell() requires self->ok=1. I choosed to reset self->ok to zero on error, but > it's maybe useless. You are calling tell() on the underlying binary buffer, not on the TextIOWrapper object. So self->ok should have no impact. Am I missing something? > self>encodefunc value have to be changed because <encoder>.setstate() changes > the encoding function: UTF-16 => UTF-16-LE or UTF-16-BE. I know, but then the logic should probably be changed and use an additional struct member to signal that the start of file has been skipped. |
|||
| msg84194 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月26日 17:52 | |
Faster patch! - add fast encoder for UTF-32, UTF-32-LE and UTF-32-BE (copy/paste of utf16 functions) - move utf-8 before utf-16-* because utf8 more popular than utf16 :-p - don't set self->encodefunc=NULL (loose all the encoder optimisation), but only fix self->encodefunc for two special cases: utf16=>utf16le or utf16be and utf32=>utf32le or utf32be - remove self->ok: it was may be usefull for an older version of my patch, but it's not more needed |
|||
| msg84195 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月26日 17:57 | |
Hum, it's a detail, but is it a good idea to keep the reference the int(0)? Or would it be better to release it at exit? |
|||
| msg84319 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年03月28日 18:07 | |
It's better now, although I think it's not good to duplicate the encoding switch logic. It would be better to have a separate flag indicate whether it's the start of stream or not. I'm gonna produce a new patch, unless you beat me to it. Also, I'm adding Amaury to the nosy list so that he tells us whether he thinks the approach is sound. |
|||
| msg84321 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年03月28日 18:53 | |
Here is a new patch catering to more cases (seek()) in addition to just opening in append mode. |
|||
| msg84373 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月29日 10:58 | |
seek() has also the problem? It's really hard to encode UTF-16/32 correctly... |
|||
| msg87554 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年05月10日 22:52 | |
Updated patch against py3k. |
|||
| msg87748 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年05月14日 17:50 | |
If no-one objects, I'm going to commit this in a couple of days. |
|||
| msg87753 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2009年05月14日 18:43 | |
As I said on IRC a few days ago, I think the patch is ready to go. |
|||
| msg87754 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年05月14日 18:56 | |
Ok, committed in r72635. |
|||
| msg109014 - (view) | Author: Conrad.Irwin (Conrad.Irwin) | Date: 2010年06月30日 20:20 | |
Shouldn't this fix be back-ported to the 2.6 branch too? |
|||
| msg111763 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年07月28日 02:00 | |
I fixed #6213 in 2.6 and 2.7, and so it's now possible to backport this fix to 2.6 => r83200 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:44 | admin | set | github: 49256 |
| 2010年07月28日 10:18:37 | flox | set | nosy:
+ flox |
| 2010年07月28日 02:00:02 | vstinner | set | messages: + msg111763 |
| 2010年06月30日 20:20:25 | Conrad.Irwin | set | nosy:
+ Conrad.Irwin messages: + msg109014 |
| 2009年05月14日 18:56:23 | pitrou | set | status: open -> closed resolution: fixed messages: + msg87754 |
| 2009年05月14日 18:43:09 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg87753 |
| 2009年05月14日 17:52:45 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2009年05月14日 17:50:50 | pitrou | set | messages: + msg87748 |
| 2009年05月10日 22:52:27 | pitrou | set | files:
+ append_bom-4.patch messages: + msg87554 |
| 2009年05月01日 21:50:41 | pitrou | set | nosy:
+ lemburg |
| 2009年03月29日 10:58:08 | vstinner | set | messages: + msg84373 |
| 2009年03月28日 18:53:10 | pitrou | set | files:
+ append_bom-3.patch messages: + msg84321 |
| 2009年03月28日 18:07:14 | pitrou | set | nosy:
+ amaury.forgeotdarc messages: + msg84319 |
| 2009年03月26日 17:57:49 | vstinner | set | messages: + msg84195 |
| 2009年03月26日 17:52:52 | vstinner | set | files: - append_bom.patch |
| 2009年03月26日 17:52:40 | vstinner | set | files:
+ append_bom-2.patch messages: + msg84194 |
| 2009年03月26日 17:41:34 | pitrou | set | messages: + msg84193 |
| 2009年03月26日 17:26:28 | vstinner | set | messages: + msg84188 |
| 2009年03月26日 16:55:37 | pitrou | set | messages: + msg84185 |
| 2009年03月20日 00:37:44 | vstinner | set | files:
+ append_bom.patch keywords: + patch messages: + msg83834 |
| 2009年03月14日 00:49:57 | pitrou | set | messages: + msg83554 |
| 2009年02月28日 17:25:56 | benjamin.peterson | link | issue4565 dependencies |
| 2009年01月21日 00:41:46 | vstinner | set | messages: + msg80303 |
| 2009年01月20日 00:12:20 | pitrou | set | nosy: + pitrou |
| 2009年01月19日 23:37:22 | vstinner | set | messages:
+ msg80223 versions: + Python 2.6, Python 2.7 |
| 2009年01月19日 23:24:35 | vstinner | create | |