Message 76176 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	vstinner
Recipients	vstinner
Date	2008年11月21日.12:32:16
SpamBayes Score	4.6242454e-12
Marked as misclassified	No
Message-id	<1227270739.29.0.218620905737.issue4377@psf.upfronthosting.co.za>

Content
I'm trying to fix IDLE to support Unicode (#4008 and #4323). Instead of IDLE builtin charset detection, I tried to use tokenize.detect_encoding() but this function doesn't work with script using Mac new line (b"\r"). Code to detect the encoding of a Python script: ---- def pythonEncoding(filename): with open(filename, 'rb') as fp: encoding, lines = detect_encoding(fp.readline) return encoding ---- Example to reproduce the problem with Mac script: ---- fp = BytesIO(b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re amie")\r') encoding, lines = detect_encoding(fp.readline) print(encoding, lines) ---- => Result: utf-8 [b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re amie")\r'] The problem occurs at "line_string = line.decode('ascii')". Since "line" contains a non-ASCII character (b"\xe8"), the conversion fails.

Content

I'm trying to fix IDLE to support Unicode (#4008 and #4323). Instead 
of IDLE builtin charset detection, I tried to use 
tokenize.detect_encoding() but this function doesn't work with script 
using Mac new line (b"\r").
Code to detect the encoding of a Python script:
----
def pythonEncoding(filename):
 with open(filename, 'rb') as fp:
 encoding, lines = detect_encoding(fp.readline)
 return encoding
----
Example to reproduce the problem with Mac script:
----
fp = BytesIO(b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re 
amie")\r')
encoding, lines = detect_encoding(fp.readline)
print(encoding, lines)
----
=> Result: utf-8 [b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re 
amie")\r']
The problem occurs at "line_string = line.decode('ascii')". 
Since "line" contains a non-ASCII character (b"\xe8"), the conversion 
fails.

History
Date	User	Action	Args
2008年11月21日 12:32:21	vstinner	set	recipients: + vstinner
2008年11月21日 12:32:19	vstinner	set	messageid: <1227270739.29.0.218620905737.issue4377@psf.upfronthosting.co.za>
2008年11月21日 12:32:17	vstinner	link	issue4377 messages
2008年11月21日 12:32:16	vstinner	create

homepage