homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: add an optional "default" argument to tokenize.detect_encoding
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, berker.peksag, eric.araujo, flox, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2010年09月04日 11:32 by flox, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
detect_encoding_default.diff flox, 2010年09月04日 11:32 Patch, apply to 3.x review
Messages (3)
msg115567 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010年09月04日 11:32
The function tokenize.detect_encoding() detects the encoding either in the coding cookie or in the BOM. If no encoding is found, it returns 'utf-8':
When result is 'utf-8', there's no (easy) way to know if the encoding was really detected in the file, or if it falls back to the default value.
Cases (with utf-8):
 - UTF-8 BOM found, returns ('utf-8-sig', [])
 - cookie on 1st line, returns ('utf-8', [line1])
 - cookie on 2nd line, returns ('utf-8', [line1, line2])
 - no cookie found, returns ('utf-8', [line1, line2])
The proposal is to allow to call the function with a different default value (None or ''), in order to know if the encoding is really detected.
For example, this function could be used by the Tools/scripts/findnocoding.py script.
Patch attached.
msg122106 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年11月22日 10:52
> no cookie found, returns ('utf-8', [line1, line2])
I never understood the usage of the second item. IMO it should be None if no cookie found.
msg173002 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月15日 20:55
> I never understood the usage of the second item. IMO it should be None if no cookie found.
UTF-8 is the default source encoding for Python 3.
History
Date User Action Args
2022年04月11日 14:57:06adminsetgithub: 53980
2014年11月02日 12:11:24berker.peksagsetnosy: + berker.peksag

versions: + Python 3.5, - Python 3.4
2012年10月15日 20:55:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg173002
2012年07月21日 13:19:57floxsetversions: + Python 3.4, - Python 3.3
2010年12月31日 01:43:04eric.araujosetnosy: + eric.araujo

versions: + Python 3.3, - Python 3.2
2010年12月30日 22:14:16georg.brandlunlinkissue7962 dependencies
2010年11月22日 10:52:50vstinnersetmessages: + msg122106
2010年11月22日 05:14:16eric.araujosetnosy: + vstinner
2010年09月04日 18:54:27pitrousetnosy: + benjamin.peterson
2010年09月04日 13:23:06floxlinkissue7962 dependencies
2010年09月04日 11:32:08floxcreate

AltStyle によって変換されたページ (->オリジナル) /