Message79299
| Author |
amaury.forgeotdarc |
| Recipients |
amaury.forgeotdarc |
| Date |
2009年01月07日.00:21:15 |
| SpamBayes Score |
0.002590916 |
| Marked as misclassified |
No |
| Message-id |
<1231287677.41.0.817743984337.issue4862@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
First write a utf-16 file with its signature:
>>> f1 = open('utf16.txt', 'w', encoding='utf-16')
>>> f1.write('0123456789')
>>> f1.close()
Then read it twice:
>>> f2 = open('utf16.txt', 'r', encoding='utf-16')
>>> print('read1', ascii(f2.read()))
read1 '0123456789'
>>> f2.seek(0)
0
>>> print('read2', ascii(f2.read()))
read2 '\ufeff0123456789'
The second read returns the BOM!
This is because the zero in seek(0) is a "cookie" which contains both the position
and the decoder state. Unfortunately, state=0 means 'endianness has been determined:
native order'.
maybe a suggestion: handle seek(0) as a special value which calls decoder.reset().
The patch implement this idea. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2009年01月07日 00:21:17 | amaury.forgeotdarc | set | recipients:
+ amaury.forgeotdarc |
| 2009年01月07日 00:21:17 | amaury.forgeotdarc | set | messageid: <1231287677.41.0.817743984337.issue4862@psf.upfronthosting.co.za> |
| 2009年01月07日 00:21:16 | amaury.forgeotdarc | link | issue4862 messages |
| 2009年01月07日 00:21:16 | amaury.forgeotdarc | create |
|