This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年03月16日 13:37 by ocean-city, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (9) | |||
|---|---|---|---|
| msg63576 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2008年03月16日 13:37 | |
Following code # coding: utf-8 print "年" outputs C:\Documents and Settings\WhiteRabbit>py3k b.py File "b.py", line 3 print "年" as expected, but following code # coding: cp932 print "年" outputs C:\Documents and Settings\WhiteRabbit>py3k a.py File "a.py", line 4 [22605 refs] Probably this happens because PyUnicode_DecodeUTF8 at Python/pythonrun.c(1757) assumes err->text to be UTF8, but this is not true when source file is not encoded with UTF8. # Sorry there is no patch. |
|||
| msg63578 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2008年03月16日 14:47 | |
Probably same problem exists in PyErr_ProgramText(). |
|||
| msg63581 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年03月16日 15:47 | |
This will involve quite some work to fix. When fetching the code, the source encoding must be recognized. Contributions are welcome. (I personally consider this issue minor, as I would encourage users to use UTF-8 as the source encoding, anyway). |
|||
| msg63628 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2008年03月17日 08:56 | |
Hello. I tracked down source code and found where err->text is set. Index: Parser/parsetok.c =================================================================== --- Parser/parsetok.c (revision 61411) +++ Parser/parsetok.c (working copy) @@ -218,7 +218,7 @@ assert(tok->cur - tok->buf < INT_MAX); err_ret->offset = (int)(tok->cur - tok->buf); len = tok->inp - tok->buf; - text = PyTokenizer_RestoreEncoding(tok, len, &err_ret->offset); +/* text = PyTokenizer_RestoreEncoding(tok, len, &err_ret->offset); */ if (text == NULL) { text = (char *) PyObject_MALLOC(len + 1); if (text != NULL) { It seems tok->buf is encoded with UTF-8, and PyTokenizer_RestoreEncoding() resotores it to original encoding of source file. So I tried above patch, output was expected on cp932/euc_jp source files. Maybe this function is not needed in py3k? I cannot find other place where this function is used. # Probably PyErr_ProgramText() needs more effort to be fixed. |
|||
| msg63633 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年03月17日 11:18 | |
You are probably right about the source of the problem; I was confusing
it with a regular exception, e.g.
print("年",a)
However, I also fail to reproduce the problem on OSX. I get
File "a.py", line 3
print "�N"
^
SyntaxError: invalid syntax
I'm not quite sure what the N is doing in there, but the first character
is the replacement character (hopefully, the tracker will reproduce it
correctly); I get that because pythonrun uses the "replace" codec.
I guess you are not seeing it because then the replacement character
cannot actually be output to your terminal. Please try
print("\ufffd")
to see what that does.
|
|||
| msg63636 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2008年03月17日 13:30 | |
> I was confusing it with a regular exception, e.g.
> print("年",a)
I'm now invesigating this problem. This comes from another reason.
Please look at fp_setreadl in Parser/tokenizer.c.
This function opens file using codec and doesn't seek to current
position. (fp_setreadl is used when codecs is neigher utf-8 nor
iso-8859-1 .... tok->decoding_state == STATE_NORMAL)
So
# coding: ascii
# 1
# 2
# 3
raise RuntimeError("a")
# 4
# 5
# 6
outputs
C:\Documents and Settings\WhiteRabbit>py3k ascii.py
Traceback (most recent call last):
File "ascii.py", line 6, in <module>
# 4
RuntimeError: a
[22821 refs]
# One line shifted.
And
# dummy
# coding: ascii
# 1
# 2
# 3
raise RuntimeError("a")
# 4
# 5
# 6
outputs
C:\Documents and Settings\WhiteRabbit>py3k ascii.py
Traceback (most recent call last):
File "ascii.py", line 8, in <module>
# 5
RuntimeError: a
[22821 refs]
# Two lines shifted.
|
|||
| msg63639 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2008年03月17日 13:36 | |
>However, I also fail to reproduce the problem on OSX. I get
>
> File "a.py", line 3
> print "�N"
> ^
>SyntaxError: invalid syntax
Umm, strange... I can output correct result even if
using euc_jp (my terminal named command prompt cannot
output euc_jp string directly, AFAIK)
> print("\ufffd")
>>> print("\ufffd")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "e:\python-dev\py3k\lib\io.py", line 1247, in write
b = encoder.encode(s)
UnicodeEncodeError: 'cp932' codec can't encode character '\ufffd' in
position 0:
illegal multibyte sequence
|
|||
| msg63641 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2008年03月17日 13:42 | |
>I'm now invesigating this problem. This comes from another reason. Of course, even if this line number problem is fixed, encoding problem still remains. Probably I'll look at it next. |
|||
| msg63767 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年03月17日 20:45 | |
The original issue is now fixed in r61462. Please open another issue for the case of regular exceptions. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:31 | admin | set | github: 46554 |
| 2008年03月17日 20:45:08 | loewis | set | status: open -> closed resolution: fixed messages: + msg63767 |
| 2008年03月17日 13:42:20 | ocean-city | set | messages: + msg63641 |
| 2008年03月17日 13:36:26 | ocean-city | set | messages: + msg63639 |
| 2008年03月17日 13:30:16 | ocean-city | set | messages: + msg63636 |
| 2008年03月17日 11:18:17 | loewis | set | messages: + msg63633 |
| 2008年03月17日 08:56:14 | ocean-city | set | messages: + msg63628 |
| 2008年03月16日 15:47:28 | loewis | set | nosy:
+ loewis messages: + msg63581 |
| 2008年03月16日 14:47:13 | ocean-city | set | messages: + msg63578 |
| 2008年03月16日 13:38:22 | ocean-city | set | title: [Py3k] -> [Py3k] No text shown when SyntaxError (when not UTF8) |
| 2008年03月16日 13:37:32 | ocean-city | create | |