homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tools/scripts/reindent.py fails on non-UTF-8 encodings
Type: behavior Stage: resolved
Components: Demos and Tools Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, christian.heimes, eric.araujo, flox, georg.brandl, iritkatriel, serhiy.storchaka, tim.peters, vstinner
Priority: normal Keywords: needs review, patch

Created on 2010年10月15日 16:54 by belopolsky, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reindent.diff belopolsky, 2010年10月15日 16:54 review
reindent_coding.py vstinner, 2011年07月07日 23:25 review
Messages (13)
msg118804 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年10月15日 16:54
Tools/scripts/reindent.py -d Lib/test/encoded_modules/module_koi8_r.py
Traceback (most recent call last):
 File "Tools/scripts/reindent.py", line 310, in <module>
 main()
 File "Tools/scripts/reindent.py", line 93, in main
 check(arg)
 File "Tools/scripts/reindent.py", line 114, in check
 r = Reindenter(f)
 File "Tools/scripts/reindent.py", line 162, in __init__
 self.raw = f.readlines()
 File "Lib/codecs.py", line 300, in decode
 (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf0 in position 59: invalid continuation byte
Attached patch fixes this issue.
msg118810 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010年10月15日 17:45
+1.
msg118812 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010年10月15日 17:53
LGTM.
msg119026 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年10月18日 14:48
Committed in r85695. Leaving open to discuss whether anything can/should be done for the case when reindent acts as an stdin to stdout filter. Also, what is the policy on backporting Tools' bug fixes?
msg119276 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010年10月21日 11:44
When working as a filter, reindent should use sys.{stdin,stdout}.encoding (defaulting to sys.getdefaultencoding()) for reading and writing, respectively. Detecting encoding on streams is not worth it IMO. People can set PYTHONIOENCODING for baroque needs.
msg139967 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月07日 10:50
> Leaving open to discuss whether anything can/should be done
> for the case when reindent acts as an stdin
sys.stdin.buffer and sys.stdout.buffer should be used with tokenize.detect_encoding(). We may read first stdin and write it into a BytesIO object to be able to rewind after detect_encoding. Something like:
content = sys.stdin.buffer.read()
raw = io.BytesIO(content)
buffer = io.BufferedReader(raw)
encoding, _ = detect_encoding(buffer.readline)
buffer.seek(0)
text = TextIOWrapper(buffer, encoding)
# use text
msg140001 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月07日 23:25
reindent_coding.py: patch fixing reindent.py when using pipes (stdin and stdout).
msg140003 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年07月07日 23:43
This is a lot more code than what I’d have expected.
What is your opinion on my previous message?
msg140005 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月07日 23:47
> When working as a filter, reindent should use sys.{stdin,stdout}.encoding
> (defaulting to sys.getdefaultencoding()) for reading and writing,
> respectively.
It just doesn't work: you cannot read a ISO-8859-1 file from UTF-8 (if your locale encoding is UTF-8).
msg140021 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年07月08日 11:19
Even with PYTHONIOENCODING?
msg315607 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年04月22日 11:15
I concur with Éric. Standard input and output are text streams in Python 3. The user can control their encoding by setting locale or PYTHONIOENCODING.
I think this issue can be closed now unless somebody want to backport the fix to 2.7.
msg377111 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020年09月18日 12:04
Since there won't be a python 2.7 backport, should this issue be closed?
msg377114 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年09月18日 12:58
> Committed in r85695. Leaving open to discuss whether anything can/should be done for the case when reindent acts as an stdin to stdout filter. Also, what is the policy on backporting Tools' bug fixes?
This is the commit:
commit 4a98e3b6d06e5477e5d62f18e85056cbb7253f98
Author: Alexander Belopolsky <alexander.belopolsky@gmail.com>
Date: Mon Oct 18 14:43:38 2010 +0000
 Issue #10117: Tools/scripts/reindent.py now accepts source files that
 use encoding other than ASCII or UTF-8. Source encoding is preserved
 when reindented code is written to a file.
> Since there won't be a python 2.7 backport, should this issue be closed?
Right, 2.7 branch is closed. I close the issue.
History
Date User Action Args
2022年04月11日 14:57:07adminsetgithub: 54326
2020年09月18日 12:58:23vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg377114

stage: resolved
2020年09月18日 12:04:38iritkatrielsetstatus: pending -> open
nosy: + iritkatriel
messages: + msg377111

2018年04月22日 11:15:28serhiy.storchakasetstatus: open -> pending

messages: + msg315607
2012年10月13日 23:02:01serhiy.storchakasetnosy: + serhiy.storchaka
2011年07月08日 11:19:44eric.araujosetmessages: + msg140021
2011年07月07日 23:47:27vstinnersetmessages: + msg140005
2011年07月07日 23:43:16eric.araujosetmessages: + msg140003
2011年07月07日 23:25:02vstinnersetfiles: + reindent_coding.py

messages: + msg140001
versions: + Python 3.3
2011年07月07日 10:50:01vstinnersetnosy: + vstinner
messages: + msg139967
2010年10月21日 11:44:18eric.araujosetmessages: + msg119276
2010年10月18日 14:48:10belopolskysetmessages: + msg119026
2010年10月15日 17:53:44georg.brandlsetnosy: + georg.brandl
messages: + msg118812
2010年10月15日 17:45:26eric.araujosetnosy: + eric.araujo
messages: + msg118810
2010年10月15日 16:56:41belopolskysetnosy: + tim.peters, christian.heimes, flox
2010年10月15日 16:54:39belopolskycreate

AltStyle によって変換されたページ (->オリジナル) /