homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: index() and count() methods of bytes and bytearray should accept byte ints
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, ezio.melotti, flox, jcea, max-alleged, petri.lehtinen, pitrou, python-dev, rhettinger, terry.reedy, vstinner, xuanji
Priority: normal Keywords: needs review, patch

Created on 2011年05月24日 19:49 by max-alleged, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue12170.patch petri.lehtinen, 2011年07月23日 20:25 review
issue12170_v2.patch petri.lehtinen, 2011年10月19日 18:18
Messages (22)
msg136786 - (view) Author: Max (max-alleged) Date: 2011年05月24日 19:49
Bytes objects when indexed provide integers, but do not accept them to many functions, making them inconsistent with other sequences.
Basic example:
>>> test = b'012'
>>> n = test[1]
>>> n
49
>>> n in test
True
>>> test.index(n)
TypeError: expected an object with the buffer interface.
It is certainly unusual for n to be in the sequence, but not to be able to find it. I would expect the result to be 1. This set of commands with list, strings, tuples, but not bytes objects.
I suspect, from issue #10616, that all the following functions would be affected:
"bytes methods: partition, rpartition, find, index, rfind, rindex, count, translate, replace, startswith, endswith"
It would make more sense to me that instead of only supporting buffer interface objects, they also accept a single integer, and treat it as if it were provided a length-1 bytes object.
The use case I came across this problem was something like this:
Given seq1 and seq2, sequences of the same type:
[seq1.index(x) for x in seq2]
This works for strings, lists, tuples, but not bytes.
msg136787 - (view) Author: Max (max-alleged) Date: 2011年05月24日 19:50
"This set of commands with list, strings, tuples, but not bytes objects."
should read
"This set of commands works with list, strings, tuples, but not bytes objects."
msg136878 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年05月25日 17:41
> It is certainly unusual for n to be in the sequence, but not to be able to find it.
Agreed. Doc Lib: 4.6. Sequence Types — str, bytes, bytearray, list, tuple, range says '''
s.index(i) index of the first occurence of i in s 
s.count(i) total number of occurences of i in s '''
so everything *in* a bytes should be valid for .index and .count.
>>> test = b'0120'
>>> z = b'0'
>>> zo = ord(z)
>>> z in test
True
>>> zo in test
True
>>> test.index(z)
0
>>> test.index(zo)
...
TypeError: expected an object with the buffer interface
>>> test.count(z)
2
>>> test.count(zo)
...
TypeError: expected an object with the buffer interface
# longer subsequences like b'01' also work
So I think the code for 3.2+ bytes.count() and bytes.index() should do the same branching as the code for bytes.__contains__.
The other functions you list, including .rindex are not general sequence functions but are string functions defined as taking subsequences as inputs. So they would never be used in generic code like .count and .index can be.
msg136883 - (view) Author: Max (max-alleged) Date: 2011年05月25日 18:34
Fair enough.
I think it would make sense for the string methods to also accept single ints where possible as well:
For haystack and needles both strings:
[haystack.find(n) for n in needles]
For both bytes, it's a bit contortionist:
[haystack.find(needles[i:i+1]) for i in range(len(needles))]
One ends up doing a lot of the [i:i+1] bending when using bytes functions.
msg140903 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年07月22日 20:02
This affects bytearray as well as bytes.
What comes to supporting integer argument to str methods, I'm -1 on that. str's "contained items" are strings of length 1.
msg141016 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年07月23日 20:25
Attached a patch with the following changes:
Allow an integer argument in range(0, 256) for the following bytes and
bytearray methods: count, find, index, rfind, rindex. Initially, only
count and index were targeted, but as index is implemented in a helper
function that is also used to implement find, rfind and rindex, these
functions were affected too.
The bytes methods were changed to use the new buffer protocol instead
of the deprecated PyObject_AsCharBuffer, for consistency with the
bytearray code.
Tests for all the modified functions were expanded to cover the new
functionality. While at it, the tests for count, index and rindex were
also further expanded (to test for slices, for example), as they were
initially quite minimal.
A paragraph describing the additional semantics of the five methods
was added to the documentation.
The error messages of index and rindex were left untouched
("substring not found" and "subsection not found"). In a case where
the first argument is an integer, the error messages could talk about
a byte instead of substring/subsection. This would have been a bit
non-straightforward to implement, so I didn't.
The docstrings were also left unchanged, as I couldn't find a good
wording for them. The problem is not that the first argument may now
be an integer, but as it can now be more than a substring or
subsection, we might have to specify what a substring or subsection
really means. And that explanation would be lengthy (because of the buffer protocol, that's not a concept that a regular Python programmer is, or even needs to be, familiar with)...
And finally, there's one thing that I'm unsure of:
When an integer out of range(0, 256) is passed as the first argument,
should we raise a ValueError or a TypeError? Currently, a ValueError
is raised, but this may be bad for index and rindex, as they raise a
ValueError when the substring or byte is not found. I made the
decision to raise a ValueError decision because __contains__ of both
bytes and bytearray raise a ValueError when passed an integer not in
range(0, 256).
msg141033 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年07月24日 02:05
> When an integer out of range(0, 256) is passed as the first argument,
> should we raise a ValueError or a TypeError?
ValueError = Inappropriate argument value (of correct type).
TypeError = Inappropriate argument type.
> Currently, a ValueError raised, but this may be bad for index and 
> rindex, as they raise a ValueError when the substring or byte is not found.
Then the users should check if the value is in range(256) before passing it to (r)index.
msg141216 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年07月27日 10:18
Ok, so the current raising semantics should be good.
msg141490 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011年08月01日 07:51
See also #12631 regarding the remove() method for bytearray.
msg141491 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年08月01日 09:04
> See also #12631 regarding the remove() method for bytearray.
AFAICS, it's about bytearray.remove() working but bytearray.index() not working as documented, and that's why I marked is as a duplicate of this issue.
msg145797 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年10月18日 11:32
> I made the
> decision to raise a ValueError decision because __contains__ of both
> bytes and bytearray raise a ValueError when passed an integer not in
> range(0, 256).
That sounds reasonable. OverflowError would have been another choice, but I agree that consistency with __contains__ is sensible.
msg145799 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年10月18日 11:35
Doc/library/stdtypes.rst needs a "versionadded" tag for the additional semantics.
Also, the patch doesn't compile fine on current default:
In file included from Objects/unicodeobject.c:487:0:
Objects/stringlib/find.h: In function ‘stringlib_parse_args_finds_byte’:
Objects/stringlib/find.h:158:5: attention : implicit declaration of function ‘stringlib_parse_args_finds’
In file included from Objects/unicodeobject.c:497:0:
Objects/stringlib/find.h: Hors de toute fonction :
Objects/stringlib/find.h:151:1: erreur: redefinition of ‘stringlib_parse_args_finds_byte’
Objects/stringlib/find.h:151:1: note: previous definition of ‘stringlib_parse_args_finds_byte’ was here
I'd say you need to either define your function as STRINGLIB(parse_args_finds_byte) (to avoid name collisions), or avoid defining it if STRINGLIB_IS_UNICODE.
msg145929 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年10月19日 18:15
Thanks for the review, Antoine. Attached an updated the patch:
- The function definition now uses STRINGLIB(...) and the function is only defined when !STRINGLIB_IS_UNICODE (so I took both approaches)
- Added a "versionadded:: 3.3" to the documentation.
msg145931 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年10月19日 18:18
Fixed a minor inconsistency.
msg146055 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年10月20日 21:58
New changeset c1effa2cdd20 by Antoine Pitrou in branch 'default':
Issue #12170: The count(), find(), rfind(), index() and rindex() methods
http://hg.python.org/cpython/rev/c1effa2cdd20 
msg146056 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年10月20日 21:58
Patch committed, thank you!
msg148232 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年11月24日 07:48
Just a thought: Would this change be worthy for the "What's new in 3.3" list?
msg148237 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年11月24日 10:27
> Just a thought: Would this change be worthy for the "What's new in 3.3" 
> list?
I think so.
msg148287 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月24日 20:04
New changeset 736b0aec412b by Petri Lehtinen in branch 'default':
Add a "What's New" entry for #12170
http://hg.python.org/cpython/rev/736b0aec412b 
msg149722 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年12月18日 00:17
New changeset 75648db1b3f3 by Victor Stinner in branch 'default':
Issue #13623: Fix a performance regression introduced by issue #12170 in
http://hg.python.org/cpython/rev/75648db1b3f3 
msg149723 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月18日 00:18
New changeset 75648db1b3f3 by Victor Stinner in branch 'default':
http://hg.python.org/cpython/rev/75648db1b3f3
Issue #13623: Fix a performance regression introduced by issue #12170 in bytes.find() and handle correctly OverflowError (raise the same ValueError than the error for -1).
msg164054 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年06月26日 07:28
New changeset 018fe1dee9b3 by Petri Lehtinen in branch 'default':
What's new: Add myself as the contributor of issue 12170
http://hg.python.org/cpython/rev/018fe1dee9b3 
History
Date User Action Args
2022年04月11日 14:57:17adminsetgithub: 56379
2012年06月26日 07:28:04python-devsetmessages: + msg164054
2011年12月18日 00:18:53vstinnersetmessages: + msg149723
2011年12月18日 00:17:54python-devsetmessages: + msg149722
2011年11月24日 20:04:29python-devsetmessages: + msg148287
2011年11月24日 10:27:17pitrousetmessages: + msg148237
2011年11月24日 07:48:32petri.lehtinensetmessages: + msg148232
2011年10月20日 21:58:46pitrousetstatus: open -> closed
messages: + msg146056

assignee: rhettinger ->
resolution: fixed
stage: patch review -> resolved
2011年10月20日 21:58:20python-devsetnosy: + python-dev
messages: + msg146055
2011年10月19日 18:18:25petri.lehtinensetfiles: + issue12170_v2.patch

messages: + msg145931
2011年10月19日 18:17:33petri.lehtinensetfiles: - issue12170_v2.patch
2011年10月19日 18:15:49petri.lehtinensetfiles: + issue12170_v2.patch

messages: + msg145929
2011年10月18日 12:20:14floxsetnosy: + flox
2011年10月18日 11:35:28pitrousetmessages: + msg145799
2011年10月18日 11:32:07pitrousetnosy: + pitrou
messages: + msg145797
2011年08月01日 09:04:12petri.lehtinensetmessages: + msg141491
2011年08月01日 07:51:55rhettingersetmessages: + msg141490
2011年07月27日 19:23:02vstinnersetnosy: + vstinner
2011年07月27日 10:45:14petri.lehtinenlinkissue12631 superseder
2011年07月27日 10:18:09petri.lehtinensetmessages: + msg141216
2011年07月24日 02:05:04ezio.melottisetmessages: + msg141033
2011年07月23日 20:25:08petri.lehtinensetkeywords: + patch, needs review
files: + issue12170.patch
messages: + msg141016

stage: test needed -> patch review
2011年07月22日 20:02:04petri.lehtinensettitle: Bytes.index() and bytes.count() should accept byte ints -> index() and count() methods of bytes and bytearray should accept byte ints
messages: + msg140903
versions: - Python 3.2
2011年06月06日 11:23:39xuanjisetnosy: + xuanji
2011年06月01日 18:04:23eric.araujosetnosy: + eric.araujo
2011年05月31日 16:00:03jceasetnosy: + jcea
2011年05月28日 10:00:21ezio.melottisetnosy: + ezio.melotti
2011年05月25日 18:36:49max-allegedsettype: behavior
versions: + Python 3.3
2011年05月25日 18:34:53max-allegedsettype: behavior -> (no value)
messages: + msg136883
versions: - Python 3.3
2011年05月25日 17:42:31rhettingersetassignee: rhettinger

nosy: + rhettinger
2011年05月25日 17:41:53terry.reedysettitle: Bytes.index() and bytes.count() do not accept byte ints -> Bytes.index() and bytes.count() should accept byte ints
2011年05月25日 17:41:15terry.reedysetversions: + Python 3.3
type: behavior

nosy: + terry.reedy
title: Bytes objects do not accept integers to many functions -> Bytes.index() and bytes.count() do not accept byte ints
messages: + msg136878
stage: test needed
2011年05月25日 16:33:09petri.lehtinensetnosy: + petri.lehtinen
2011年05月24日 19:50:58max-allegedsetmessages: + msg136787
2011年05月24日 19:49:09max-allegedcreate

AltStyle によって変換されたページ (->オリジナル) /