Message76176
| Author |
vstinner |
| Recipients |
vstinner |
| Date |
2008年11月21日.12:32:16 |
| SpamBayes Score |
4.6242454e-12 |
| Marked as misclassified |
No |
| Message-id |
<1227270739.29.0.218620905737.issue4377@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
I'm trying to fix IDLE to support Unicode (#4008 and #4323). Instead
of IDLE builtin charset detection, I tried to use
tokenize.detect_encoding() but this function doesn't work with script
using Mac new line (b"\r").
Code to detect the encoding of a Python script:
----
def pythonEncoding(filename):
with open(filename, 'rb') as fp:
encoding, lines = detect_encoding(fp.readline)
return encoding
----
Example to reproduce the problem with Mac script:
----
fp = BytesIO(b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re
amie")\r')
encoding, lines = detect_encoding(fp.readline)
print(encoding, lines)
----
=> Result: utf-8 [b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re
amie")\r']
The problem occurs at "line_string = line.decode('ascii')".
Since "line" contains a non-ASCII character (b"\xe8"), the conversion
fails. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2008年11月21日 12:32:21 | vstinner | set | recipients:
+ vstinner |
| 2008年11月21日 12:32:19 | vstinner | set | messageid: <1227270739.29.0.218620905737.issue4377@psf.upfronthosting.co.za> |
| 2008年11月21日 12:32:17 | vstinner | link | issue4377 messages |
| 2008年11月21日 12:32:16 | vstinner | create |
|