This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年06月30日 15:22 by jvanpraag, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| junk.txt | jvanpraag, 2010年06月30日 15:22 | Text file with 'bad' characters in 3rd line. | ||
| Messages (4) | |||
|---|---|---|---|
| msg108987 - (view) | Author: John Van Praag (jvanpraag) | Date: 2010年06月30日 15:22 | |
The declaration errors='replace' works from within IDLE but not at the Windows command line. I am attaching a program and text file that demonstrate the problem. The error shows up at the Windows command line as follows: C:\Users\John\Documents\Python\bug_reports001円>python -m read_my_file aaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaa bbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb Traceback (most recent call last): File "C:\Python31\lib\runpy.py", line 128, in _run_module_as_main "__main__", fname, loader, pkg_name) File "C:\Python31\lib\runpy.py", line 34, in _run_code exec(code, run_globals) File "C:\Users\John\Documents\Python\bug_reports001円\read_my_file.py", line 20, in <module> readf() File "C:\Users\John\Documents\Python\bug_reports001円\read_my_file.py", line 17, in readf print(line) File "C:\Python31\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 10-11: character maps to <undefined> NOTE: It appears I can only attach 1 file to this report. So I am copying the program here. The text file to read is attached. ''' read_my_file.py: Reads lines from faulty file. Hangs at line 3 when run from Windows command line. Platforms: Windows Vista Ultimate 64-bit Python 3.1.2 ''' #The file to read. my_file = 'junk.txt' def readf(): #The declaration "errors='replace'" is suppposed replace characters the reader does not recognize with a dummy character such as a question mark. #This fix works in the interpreter, but not from the Windows command line. fh_read = open(my_file, errors='replace') for line in fh_read: print(line) #Run. readf() |
|||
| msg109024 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2010年06月30日 22:29 | |
The problem is not in the reading part, but in the print(). Since the default encoding of your terminal is cp437 and cp437 is not able to encode the "bad character" (U+2019 RIGHT SINGLE QUOTATION MARK), an error is raised. |
|||
| msg109048 - (view) | Author: John Van Praag (jvanpraag) | Date: 2010年07月01日 13:54 | |
According to the documentation of the open function: errors is an optional string that specifies how encoding and decoding errors are to be handled–this cannot be used in binary mode. Pass 'strict' to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass 'ignore' to ignore errors. (Note that ignoring encoding errors can lead to data loss.) 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data. If a replacement marker such as '?' were replacing the bad characters, the print function would not have a problem. The open function is not working as described in the documentation. On 2010年6月30日 22:29 +0000, "Ezio Melotti" <report@bugs.python.org> wrote: > > Ezio Melotti <ezio.melotti@gmail.com> added the comment: > > The problem is not in the reading part, but in the print(). > Since the default encoding of your terminal is cp437 and cp437 is not > able to encode the "bad character" (U+2019 RIGHT SINGLE QUOTATION MARK), > an error is raised. > > ---------- > nosy: +ezio.melotti > resolution: -> invalid > stage: -> committed/rejected > status: open -> closed > type: -> behavior > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue9126> > _______________________________________ > |
|||
| msg109050 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年07月01日 14:04 | |
The characters are fine when you read them (that is, they decode correctly to unicode). They are only invalid when you write them to the windows terminal, which can't handle all the valid characters that are in the file. The Idle output window uses a more capable character set, and can display those characters. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:03 | admin | set | github: 53372 |
| 2010年07月01日 14:04:08 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg109050 |
| 2010年07月01日 13:54:21 | jvanpraag | set | messages: + msg109048 |
| 2010年06月30日 22:29:12 | ezio.melotti | set | status: open -> closed type: behavior nosy: + ezio.melotti messages: + msg109024 resolution: not a bug stage: resolved |
| 2010年06月30日 16:51:42 | benjamin.peterson | link | issue9029 superseder |
| 2010年06月30日 15:22:45 | jvanpraag | create | |