This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年08月19日 01:19 by brett.cannon, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| fix_findencoding.diff | brett.cannon, 2008年08月19日 05:25 | |||
| Messages (8) | |||
|---|---|---|---|
| msg71397 - (view) | Author: Brett Cannon (brett.cannon) * (Python committer) | Date: 2008年08月19日 01:19 | |
Turns out that PyTokenizer_FindEncoding() never properly succeeds because the tok_state used by it does not have tok->filename set, which is an error condition in the tokenizer. This error has been masked by the one place the function is used, imp.find_module() because a NULL return is never checked for an error, but instead just assumes the default source encoding suffices. |
|||
| msg71398 - (view) | Author: Brett Cannon (brett.cannon) * (Python committer) | Date: 2008年08月19日 01:20 | |
I have not bothered to check if this exists in 2.6, but I don't see why it would be any different. |
|||
| msg71399 - (view) | Author: Brett Cannon (brett.cannon) * (Python committer) | Date: 2008年08月19日 01:44 | |
Turns out that the NULL return value can signal an error that manifests
itself as SyntaxError("encoding problem: with BOM") thanks to the lack
of tok->filename being set in Parser/tokenizer.c:fp_setreadl() which is
called by check_coding_spec() and assumes that since tok->encoding was
never set (because fp_setreadl() returned an error value) that it had
something to do with the BOM.
The only reason this was found is because my bootstrapping of importlib
into Py3K, at some point, triggers a PyErr_Occurred() which finally
notices the error.
|
|||
| msg71407 - (view) | Author: Brett Cannon (brett.cannon) * (Python committer) | Date: 2008年08月19日 05:25 | |
Attached is a patch that fixes where the error occurs. By opening the file by either file name or file descriptor, the problem goes away. Once this patch is accepted then PyErr_Occurred() should be added to all uses of PyTokenizer_FindEncoding(). |
|||
| msg72392 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2008年09月03日 16:41 | |
I don't understand the whole decoding machinery in the tokenizer, but the patch looks ok to me. (tested in debug mode under Linux and Windows) |
|||
| msg72420 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年09月03日 21:26 | |
The patch also looks pretty harmless to me. :) |
|||
| msg72477 - (view) | Author: Hyeshik Chang (hyeshik.chang) * (Python committer) | Date: 2008年09月04日 03:35 | |
pitrou, that's because Python source code can't be correctly tokenized
when it's encoded in few odd encodings like iso-2022 or shift-jis which
utilizes ,円 (, ) and " as second byte of two-byte character sequence.
For example, '\x81\\' is HORIZONTAL BAR in shift-jis,
exec('print "\x81\\"')
fails. because of " is ignored by second byte of '\x81\\'.
|
|||
| msg72480 - (view) | Author: Brett Cannon (brett.cannon) * (Python committer) | Date: 2008年09月04日 05:04 | |
Committed in r66209. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:37 | admin | set | github: 47844 |
| 2008年09月04日 05:04:57 | brett.cannon | set | status: open -> closed resolution: accepted messages: + msg72480 |
| 2008年09月04日 03:35:43 | hyeshik.chang | set | nosy:
+ hyeshik.chang messages: + msg72477 |
| 2008年09月03日 21:26:19 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg72420 |
| 2008年09月03日 16:41:40 | pitrou | set | nosy:
+ pitrou messages: + msg72392 |
| 2008年08月21日 20:33:50 | brett.cannon | set | keywords: + needs review |
| 2008年08月21日 18:35:14 | brett.cannon | set | priority: critical -> release blocker |
| 2008年08月19日 05:25:15 | brett.cannon | set | files:
+ fix_findencoding.diff keywords: + patch messages: + msg71407 |
| 2008年08月19日 02:37:05 | brett.cannon | link | issue3574 dependencies |
| 2008年08月19日 01:44:30 | brett.cannon | set | messages: + msg71399 |
| 2008年08月19日 01:20:05 | brett.cannon | set | type: behavior messages: + msg71398 |
| 2008年08月19日 01:19:38 | brett.cannon | create | |