This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年12月20日 23:37 by gangesmaster, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue16741.patch | mrabarnett, 2012年12月30日 02:52 | review | ||
| issue16741#2.patch | mrabarnett, 2012年12月30日 17:12 | |||
| int_from_str.patch | serhiy.storchaka, 2013年05月05日 19:40 | review | ||
| int_from_str-3.3_2.patch | serhiy.storchaka, 2013年06月09日 17:23 | Patch for 3.3 | review | |
| int_from_str-2.7.patch | serhiy.storchaka, 2013年07月14日 12:37 | Patch for 2.7 | review | |
| Messages (18) | |||
|---|---|---|---|
| msg177863 - (view) | Author: ganges master (gangesmaster) | Date: 2012年12月20日 23:37 | |
I'm not sure if it's a bug or just an inconvenience, but when a string containing \x00 is passed to int/float/etc, they return a misleading exception:
>>> int("abc")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'abc'
>>> int("\x00abc")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
>>> float("\x00abc")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float:
I noticed the code does actually try to handle it:
http://hg.python.org/cpython/file/39803c20c9bf/Objects/intobject.c#l1066
but still, the reported error is very misleading.
|
|||
| msg177915 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2012年12月22日 01:30 | |
Python takes a long way round when converting strings to int. It does the following (I'll be talking about Python 3.3 here):
1. In function 'fix_decimal_and_space_to_ascii', the different kinds of spaces are converted to " " and the different kinds of digits are converted to their equivalents in the ASCII range;
2. The resulting string is converted to UTF-8;
3. The resulting string is passed to 'PyLong_FromString', which expects a null-terminated string.
4. If 'PyLong_FromString' is unable to parse the string as an int, it builds an error message using the string that was passed into it, which it does by converting that string _back_ into Unicode.
As a result of step 4, the string that's reported as the value in the error message is _not_ necessarily correct.
For example:
>>> int("\N{ARABIC-INDIC DIGIT ONE}")
1
>>> int("#\N{ARABIC-INDIC DIGIT ONE}")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
int("#\N{ARABIC-INDIC DIGIT ONE}")
ValueError: invalid literal for int() with base 10: '#1'
And it also means a "\x00" and anything after it will be omitted:
>>> int("foo\x00bar")
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
int("foo\x00bar")
ValueError: invalid literal for int() with base 10: 'foo'
And in a final point, 'PyLong_FromString' limits the length of the value it reports in the error message, and the code that does it includes this line:
slen = strlen(orig_str) < 200 ? strlen(orig_str) : 200;
|
|||
| msg178003 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2012年12月23日 18:12 | |
It occurred to me that the truncation of the string when building the error message could cause a UnicodeDecodeError:
>>> int("1".ljust(199) + "\u0100")
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
int("1".ljust(199) + "\u0100")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 199: unexpected end of data
This is because it's truncating a UTF-8 string, and the truncation is in the middle of a multi-byte sequence.
|
|||
| msg178548 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2012年12月30日 02:52 | |
I've attached a patch.
It now reports an invalid literal as-is:
>>> int("#\N{ARABIC-INDIC DIGIT ONE}")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
int("#\N{ARABIC-INDIC DIGIT ONE}")
ValueError: invalid literal for int() with base 10: '#١'
>>> int("foo\x00bar")
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
int("foo\x00bar")
ValueError: invalid literal for int() with base 10: 'foo\x00bar'
There's a slight difference in that it truncates to 200 codepoints, not 200 UTF-8 bytes.
|
|||
| msg178587 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2012年12月30日 17:12 | |
I've attached a small additional patch for truncating the UTF-8. I don't know whether it's strictly necessary, but I don't know that it's unnecessary either! (Better safe than sorry.) |
|||
| msg188458 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月05日 19:40 | |
Here is a patch based on Matthew's patch. It is smaller (+35 lines vs +59) but fixes error messages for more cases:
int(b'1230円') -- bytes string with null without base.
int(b'123\xbd') -- non-utf-8 bytes string.
int('123\ud800') -- lone surrogate in unicode string.
Unfortunately it is not easy to backport it to 2.7. PyErr_Format() in 2.7 works only with null-terminated strings. I propose to fix this issue on 3.3+ and declare it as "won't fix" for 2.7.
|
|||
| msg188463 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年05月05日 20:17 | |
int_from_str.patch:
+ strobj = PySequence_GetSlice(u, 0, 200);
+ if (strobj != NULL) {
+ PyErr_Format(PyExc_ValueError,
+ "invalid literal for int() with base %d: %R",
+ base, strobj);
+ Py_DECREF(strobj);
+ }
Oh, it remembers me that #7330 is still open.
|
|||
| msg188777 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月09日 13:31 | |
Thanks, for 3.4 I will use new formatting feature. |
|||
| msg190212 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月28日 13:50 | |
Are there any other comments? |
|||
| msg190866 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年06月09日 17:23 | |
Patch updated. It now reuses code for bytes->int in longobject.c and abstract.c, doesn't raise UnicodeDecodeError for non-utf-8 bytes, and always reports an invalid bytes literal as a bytes object. |
|||
| msg193047 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年07月14日 12:37 | |
Here is a patch for 2.7. |
|||
| msg193556 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年07月22日 18:13 | |
If there are no objections I'm going to commit patches soon. |
|||
| msg193558 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2013年07月22日 18:45 | |
I don't like the idea to change the behavior of 2.7 so late in its release cycle. Benjamin, what's your opinion? |
|||
| msg193565 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2013年07月22日 20:42 | |
Yeah, let's just fix Python 3. |
|||
| msg194282 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年08月03日 18:20 | |
New changeset ecc8512b427d by Serhiy Storchaka in branch '3.3': Issue #16741: Fix an error reporting in int(). http://hg.python.org/cpython/rev/ecc8512b427d New changeset 4fd48a807812 by Serhiy Storchaka in branch 'default': Issue #16741: Fix an error reporting in int(). http://hg.python.org/cpython/rev/4fd48a807812 |
|||
| msg194302 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年08月03日 21:16 | |
There is a test in test_unicode which expects an UnicodeError for int('\ud800'). Now it fails. Should we fix a test or int()?
|
|||
| msg194304 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2013年08月03日 21:45 | |
I'd say fix the test. Raising ValueError is correct in this case. UnicodeError was an implementation artifact. |
|||
| msg194310 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年08月03日 22:06 | |
New changeset 7b023134ad83 by Serhiy Storchaka in branch '3.3': Issue #16741: Remove testing of implementation artifact. http://hg.python.org/cpython/rev/7b023134ad83 New changeset 1b4772ab420f by Serhiy Storchaka in branch 'default': Issue #16741: Remove testing of implementation artifact. http://hg.python.org/cpython/rev/1b4772ab420f |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:39 | admin | set | github: 60945 |
| 2014年05月23日 20:39:13 | terry.reedy | link | issue21546 superseder |
| 2013年08月06日 14:48:58 | serhiy.storchaka | set | status: open -> closed resolution: fixed |
| 2013年08月03日 22:06:07 | python-dev | set | messages: + msg194310 |
| 2013年08月03日 21:45:20 | belopolsky | set | nosy:
+ belopolsky messages: + msg194304 |
| 2013年08月03日 21:16:13 | serhiy.storchaka | set | status: closed -> open resolution: fixed -> (no value) messages: + msg194302 |
| 2013年08月03日 18:33:00 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2013年08月03日 18:20:47 | python-dev | set | nosy:
+ python-dev messages: + msg194282 |
| 2013年07月22日 20:42:37 | benjamin.peterson | set | messages: + msg193565 |
| 2013年07月22日 18:45:13 | christian.heimes | set | nosy:
+ christian.heimes, benjamin.peterson messages: + msg193558 |
| 2013年07月22日 18:13:18 | serhiy.storchaka | set | messages: + msg193556 |
| 2013年07月14日 12:37:58 | serhiy.storchaka | set | files:
+ int_from_str-2.7.patch messages: + msg193047 |
| 2013年06月09日 17:23:10 | serhiy.storchaka | set | files:
+ int_from_str-3.3_2.patch messages: + msg190866 |
| 2013年05月28日 13:50:15 | serhiy.storchaka | set | messages: + msg190212 |
| 2013年05月09日 13:31:11 | serhiy.storchaka | set | messages: + msg188777 |
| 2013年05月05日 20:17:02 | vstinner | set | messages: + msg188463 |
| 2013年05月05日 19:41:35 | serhiy.storchaka | set | assignee: serhiy.storchaka |
| 2013年05月05日 19:40:45 | serhiy.storchaka | set | files:
+ int_from_str.patch versions: - Python 3.2 nosy: + chris.jerdonek messages: + msg188458 |
| 2013年04月19日 21:36:16 | isoschiz | set | nosy:
+ isoschiz |
| 2012年12月30日 17:18:22 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka versions: + Python 3.4 |
| 2012年12月30日 17:12:39 | mrabarnett | set | files:
+ issue16741#2.patch messages: + msg178587 |
| 2012年12月30日 05:32:02 | ezio.melotti | set | nosy:
+ vstinner, ezio.melotti stage: patch review |
| 2012年12月30日 02:52:19 | mrabarnett | set | files:
+ issue16741.patch keywords: + patch messages: + msg178548 |
| 2012年12月23日 18:12:33 | mrabarnett | set | messages: + msg178003 |
| 2012年12月22日 01:30:16 | mrabarnett | set | nosy:
+ mrabarnett messages: + msg177915 |
| 2012年12月22日 01:17:17 | terry.reedy | set | versions: - Python 2.6, Python 3.1 |
| 2012年12月21日 02:03:46 | benjamin.peterson | set | nosy:
+ mark.dickinson |
| 2012年12月20日 23:37:21 | gangesmaster | create | |