homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bisect module: Overflow at index computation
Type: behavior Stage: patch review
Components: Extension Modules Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Voo, akira, amaury.forgeotdarc, benjamin.peterson, jcea, mark.dickinson, pitrou, python-dev, rhettinger, tim.peters
Priority: normal Keywords: patch

Created on 2011年11月28日 22:34 by Voo, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue13496.patch mark.dickinson, 2012年01月28日 10:03 review
issue13496_v2.patch mark.dickinson, 2012年03月10日 15:29 review
Messages (18)
msg148517 - (view) Author: Daniel Sturm (Voo) Date: 2011年11月28日 22:34
The mid index computation in _bisectmodule.c in both internal_bisect_right and internal_bisect_left is done with:
mid = (lo + hi) / 2; // all three variables Py_ssize_t
which is susceptible to overflows for large arrays, which would lead to undefined behavior (and in practice almost certainly a crash with a negative index)
The fix is trivial - mid = lo + (hi - lo) / 2; - but since I'm just starting to look into the code base I may be missing some undocumented assertions that guarantee this can't happen.
msg148520 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011年11月28日 23:04
This looks like a reasonable suggestion.
msg148524 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2011年11月28日 23:45
FWIW, I doubt there's a real issue here. Objects in Python consume a lot more than a byte or two of memory, so the index range of a Python list is generally a lot less than ssize_t allows for. In other words, quantify "large" in "large arrays". How large can a Python list actually be, relative to ssize_t? Similar reasoning accounts for why we never worry about overflow when mucking with refcounts: the size of a refcount member exceeds the maximum number of references that could exist.
msg148531 - (view) Author: Daniel Sturm (Voo) Date: 2011年11月29日 00:25
TBH I saw this more as an opportunity to get used to the whole system, how to create a patch, etc. :) Should've made it clearer at the start that this is unlikely to ever be a problem, sorry (didn't see a way to set priority to low myself).
If my math isn't completely off, the highest possible value here is 
hi + (hi - 1) so this only occurs if hi > (MAX_SSIZE_T + 1) / 2. So this would only work out if we had such an array containing only a handful different objects. And then that's me assuming that a list is actually a thin wrapper around an array of PyObject*.
msg148567 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011年11月29日 14:03
Given that we typically need at least 4 bytes just for the PyObject * pointer for each item in a list, I guess real lists are safe.
But how about list-like objects, implementing __len__ and __getitem__? The following appears to run forever on my machine:
class SquaresList(object):
 def __init__(self, length):
 self._length = length
 def __len__(self):
 return self._length
 def __getitem__(self, index):
 if not 0 <= index <= self._length:
 raise IndexError
 return index**2
import bisect, sys
squareslist = SquaresList(sys.maxsize)
print bisect.bisect(squareslist, (sys.maxsize - 3)**2)
msg148569 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年11月29日 14:05
Very good point, Mark.
msg148574 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011年11月29日 14:56
[x]range is enough to trigger the bug::
 bisect.bisect(range(sys.maxsize), sys.maxsize-3)
msg148607 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011年11月29日 20:55
The fix is trivial. I'll do it soonish.
msg149239 - (view) Author: Akira Li (akira) * Date: 2011年12月11日 19:10
Related bug in Java: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
Do not consider any change as trivial: http://www.solipsys.co.uk/new/BinarySearchReconsidered.html (the author ran "binary search" coding challenge about 10 years ago http://www.solipsys.co.uk/b_search/ )
msg152152 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012年01月28日 10:03
Here's a patch.
msg152392 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年01月31日 16:01
I think the patch is *NOT* correct. Does it work with http://bugs.python.org/msg148567, in a 32 bits python?
msg152400 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012年01月31日 17:50
> I think the patch is *NOT* correct.
Can you elaborate? It works fine for that example on a 64-bit build; not sure why 32-bit would be any different.
The idea is just that the single cast forces the addition to be done as an addition of integers of type size_t. Since the integers being added are (a) nonnegative and (b) both fit into a Py_ssize_t, the sum fits into a size_t without overflow, and the result of dividing the sum by 2 fits into a size_t.
(This is all assuming that 2*Py_SSIZE_T_MAX fits into a size_t, but that's a fairly safe assumption in practice.)
msg152401 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012年01月31日 17:59
> Does it work with http://bugs.python.org/msg148567, in a 32 bits python?
Yes.
msg152402 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年01月31日 18:11
Perhaps adding a comment before those two lines would make the reason for the cast clearer.
msg155318 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012年03月10日 15:29
Updated patch, with comment.
msg155328 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012年03月10日 17:53
Thanks for the updated patch.
msg155436 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012年03月12日 03:44
LGTM
msg158340 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年04月15日 15:43
New changeset 35a3a7e0d66d by Mark Dickinson in branch '3.2':
Issue 13496: Fix bisect.bisect overflow bug for large collections.
http://hg.python.org/cpython/rev/35a3a7e0d66d
New changeset 1a9252280f07 by Mark Dickinson in branch 'default':
Issue #13496: Merge from 3.2
http://hg.python.org/cpython/rev/1a9252280f07
New changeset 709af2d0e862 by Mark Dickinson in branch '2.7':
Issue 13496: Fix bisect.bisect overflow bug for large collections.
http://hg.python.org/cpython/rev/709af2d0e862 
History
Date User Action Args
2022年04月11日 14:57:24adminsetgithub: 57705
2012年04月15日 15:44:15mark.dickinsonsetstatus: open -> closed
resolution: fixed
2012年04月15日 15:43:46python-devsetnosy: + python-dev
messages: + msg158340
2012年03月12日 03:44:06benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg155436
2012年03月10日 17:53:52rhettingersetmessages: + msg155328
2012年03月10日 15:29:29mark.dickinsonsetfiles: + issue13496_v2.patch

messages: + msg155318
2012年01月31日 18:11:49pitrousetmessages: + msg152402
2012年01月31日 17:59:14mark.dickinsonsetmessages: + msg152401
2012年01月31日 17:50:57mark.dickinsonsetmessages: + msg152400
2012年01月31日 16:01:07jceasetmessages: + msg152392
2012年01月28日 20:26:50mark.dickinsonsetstage: patch review
2012年01月28日 10:03:49mark.dickinsonsetfiles: + issue13496.patch
keywords: + patch
messages: + msg152152
2011年12月11日 19:10:15akirasetnosy: + akira
messages: + msg149239
2011年12月02日 18:51:07jceasetnosy: + jcea
2011年11月29日 20:55:07rhettingersetpriority: low -> normal

messages: + msg148607
2011年11月29日 14:56:21amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg148574
2011年11月29日 14:05:44pitrousetnosy: + pitrou
messages: + msg148569
2011年11月29日 14:03:40mark.dickinsonsetmessages: + msg148567
2011年11月29日 00:25:36Voosetmessages: + msg148531
2011年11月28日 23:45:41tim.peterssetnosy: + tim.peters
messages: + msg148524
2011年11月28日 23:04:30rhettingersetpriority: normal -> low
assignee: rhettinger
messages: + msg148520
2011年11月28日 22:37:01pitrousetnosy: + rhettinger, mark.dickinson

versions: + Python 2.7, Python 3.2, Python 3.3, - Python 3.4
2011年11月28日 22:34:31Voocreate

AltStyle によって変換されたページ (->オリジナル) /