This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年02月28日 11:02 by ocean-city, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| py3k_mmap_bytes_cleanup_with_getarg_c.patch | ocean-city, 2009年04月02日 09:12 | with getarg('c') | ||
| py3k_mmap_bytes_cleanup_with_getarg_b.patch | ocean-city, 2009年04月02日 11:59 | with getarg('b') | ||
| Messages (22) | |||
|---|---|---|---|
| msg82903 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年02月28日 11:02 | |
On Python3000, mmap.read_byte returns str not bytes, and mmap.write_byte
accepts str. Is this intended behavior?
>>> import mmap
>>> m = mmap.mmap(-1, 10)
>>> type(m.read_byte())
<class 'str'>
>>> m.write_byte("a")
>>> m.write_byte(b"a")
Maybe another possibility. read_byte() returns int which represents
byte, write_byte accepts int which represents byte. (Like b"abc"[0]
returns int not 1-length bytes)
|
|||
| msg82904 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年02月28日 11:05 | |
Indeed, I think it should use the "b" code, instead of the "c" code. Please discuss this on python-dev, though. It might not be ok to backport this to 3.0, since it may break existing code. |
|||
| msg82905 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年02月28日 11:06 | |
Furthermore, all other uses of the "c" code might need to be reconsidered. |
|||
| msg82910 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年02月28日 14:39 | |
loewis> Furthermore, all other uses of the "c" code might loewis> need to be reconsidered. $ grep 'BuildValue.*"c"' */*c Modules/_cursesmodule.c: return Py_BuildValue("c", rtn); Modules/mmapmodule.c: return Py_BuildValue("c", value); $ grep '_Parse[^"]\+"[^":;]*c' */*c Modules/mmapmodule.c: if (!PyArg_ParseTuple(args, "c:write_byte", &value)) PC/msvcrtmodule.c: if (!PyArg_ParseTuple(args, "c:putch", &ch)) PC/msvcrtmodule.c: if (!PyArg_ParseTuple(args, "c:ungetch", &ch)) So we have: * mmap.read_byte()->char, mmap.write_byte(char): should be fixed to use bytes * <curses window>.getkey()->char: it looks correct because the function uses also "return PyUnicode_FromString(...);" * msvcrt.putch(char), msvcrt.ungetch(char): msvcrt has also: - msvcrt.getch()->byte string of 1 byte - msvcrt.getwch()->unicode string of 1 character - msvcrt.putwch(unicode string of 1 character) - msvcrt_ungetwch(unicode string of 1 character) Hum, putch(), ungetch(), getch() use inconsistent types (unicode/bytes) and should be fixed. Another issue should be open for that. Notes: msvcrt.putwch() accepts string of length > 1 and msvcrt.ungetwch() doesn't check string length (and so may crash with length=0 or length > 1?). |
|||
| msg82912 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年02月28日 14:48 | |
I think more *bytes* cleanup is needed for mmap module documentation & implementation. (and other modules?) Especially mmap.find() and its friends. >>> import mmap >>> m = mmap.mmap(-1, 10) >>> m[:] = b"0123456789" >>> m.find(b'2') 2 >>> m.find('2') # XXX: accepts unicode 2 |
|||
| msg82947 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年02月28日 21:18 | |
Patch attached. read_byte and write_byte use integer as byte, and other bytes related cleanup. |
|||
| msg83677 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月17日 13:05 | |
In the python-dev thread, most people agree to use bytes in mmap. Did anyone reviewed the patch? Can you commit it to py3k than then to the 3.0 branch? |
|||
| msg83678 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年03月17日 13:39 | |
>In the python-dev thread, most people agree to use bytes in mmap. Did >anyone reviewed the patch? Well, there is no conclusion which of your choices (a or b) is preferred. http://www.nabble.com/What-type-of-object-mmap.read_byte-should-return-on-py3k--tt22261245.html#a22261245 I opened similar issue for msvcrt in issue5410. >Can you commit it to py3k than then to the 3.0 branch? If the patch is acceptable, yes. This patch will change behavior of functions, I don't think I can commit this without review. |
|||
| msg83684 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月17日 17:08 | |
Le Tuesday 17 March 2009 14:39:59 Hirokazu Yamamoto, vous avez écrit : > Well, there is no conclusion which of your choices (a or b) is preferred. Guido just wrote in other mail thread (py3k: accept unicode for 'c' and byte for 'C' in getarg?): "Yeah, mmap should be defined exclusively in terms of bytes." > I opened similar issue for msvcrt in issue5410. Cool, thanks. > >Can you commit it to py3k than then to the 3.0 branch? > > If the patch is acceptable, yes. This patch will change behavior of > functions, I don't think I can commit this without review. Sure, we need a review of the patch. Should the patch be ported to 3.0? |
|||
| msg83688 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年03月17日 17:39 | |
Ah, no, this should not be backported to 3.0. Martin saids so in msg82904, and I agreed. |
|||
| msg83693 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年03月17日 19:41 | |
STINNER Victor wrote: > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Le Tuesday 17 March 2009 14:39:59 Hirokazu Yamamoto, vous avez écrit : >> Well, there is no conclusion which of your choices (a or b) is preferred. > > Guido just wrote in other mail thread (py3k: accept unicode for 'c' and byte > for 'C' in getarg?): > > "Yeah, mmap should be defined exclusively in terms of bytes." How does that answer the question? We know what data type to use for multiple bytes - but what data type should be used for a single byte? |
|||
| msg83695 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月17日 20:55 | |
> How does that answer the question? We know what data type to
> use for multiple bytes - but what data type should be used
> for a single byte?
Hum, what was the question? :-) Quote of my email:
« About m.read_byte(), we have two choices:
(a) Py_BuildValue("b", value) => 0
(b) Py_BuildValue("y#", &value, 1) => b"\x00"
About m.write_byte(x), we have also two choices:
(a) PyArg_ParseTuple(args, "b:write_byte", &value): write_byte(0)
(b) PyArg_ParseTuple(args, "y#:write_byte", &value, &length) and
check for length=1: write_byte(b"\x00")
(b) choices are close to Python 2.x API. But we can already use
m.read(1)->b"\x00" and m.write(b"\x00") to use byte string of 1 byte.
So it would be better to break the API and use integers, (a) choices
(...) »
Oh, I though that the question was about bytes vs str :-/ Ocean-city
and I prefer the solution (a). And you Martin?
|
|||
| msg83696 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年03月17日 21:10 | |
> Oh, I though that the question was about bytes vs str :-/ Ocean-city > and I prefer the solution (a). And you Martin? I also think that ints should be used to represent individual bytes. However, your list of alternatives is incomplete: we *could* also change the "c" code to accept and produce int - then mmapmodule would not need to change at all. |
|||
| msg84229 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月26日 22:35 | |
martin> However, your list of alternatives is incomplete: we martin> *could* also change the "c" code to accept and produce martin> int - then mmapmodule would not need to change at all. That's extactly the idea that I proposed in issue #5499 ;-) I prefer to have strict getarg('c') and getarg('C') than fixing each module. |
|||
| msg85183 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年04月02日 08:35 | |
@ocean-city: Can you update your patch to leave
Py_BuildValue("c", ...) and
PyArg_ParseTuple(args, "c:write_byte", ...) unchanged, since #5499 is
fixed?
|
|||
| msg85185 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年04月02日 09:07 | |
Yes, here is the patch. But I noticed Py_BuildValue('c') still returns
unicode. To pass the test, #5666 is needed.
|
|||
| msg85197 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年04月02日 11:41 | |
So <mmap object>.read_byte() gives a byte string of 1 byte, ok. Port from Python2 will be easier. The patch looks correct, thanks for updating it. We know have to wait for #5666 :-) |
|||
| msg85198 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年04月02日 11:59 | |
Yes, you can do
m.write_byte(b"a")
but on the other hand you cannot do
a = b"abc"
m.write_byte(a[1])
instead you should do
a = b"abc"
m.write_byte(a[1:2])
This is trade off, though.
I'll update "with getarg('b') version" to compare.
|
|||
| msg85354 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年04月04日 00:31 | |
#5666 is closed. I finally prefers getarg('c') (byte string of lenght 1). |
|||
| msg85412 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2009年04月04日 17:10 | |
Fixed in r71174. |
|||
| msg118902 - (view) | Author: (benrg) | Date: 2010年10月16日 22:10 | |
With this patch, read_byte returns an integer in the range -128 to 127 instead of 0 to 255 if char is signed. Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 is affected by this. I think it is a bug. The test code would fail if the test string contained any bytes outside the ASCII range. (Did this really go unnoticed for a year and a half? I noticed it the moment I first tried to use read_byte (which was just now). I see that read_byte was broken in a different way in 3.0. Does anybody actually use it?) |
|||
| msg120398 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2010年11月04日 12:36 | |
Thank for pointing this out. I've looked at bytearray and bytes implementations, they actually return unsigned value. I fixed this in r86159(py3k) and r86160(release31-maint). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:46 | admin | set | github: 49641 |
| 2010年11月04日 12:36:16 | ocean-city | set | messages: + msg120398 |
| 2010年10月16日 22:10:24 | benrg | set | nosy:
+ benrg messages: + msg118902 |
| 2009年04月04日 17:10:28 | benjamin.peterson | set | status: open -> closed nosy: + benjamin.peterson messages: + msg85412 resolution: fixed |
| 2009年04月04日 00:31:45 | vstinner | set | messages: + msg85354 |
| 2009年04月02日 11:59:18 | ocean-city | set | files: - py3k_mmap_and_bytes.patch |
| 2009年04月02日 11:59:05 | ocean-city | set | files:
+ py3k_mmap_bytes_cleanup_with_getarg_b.patch messages: + msg85198 |
| 2009年04月02日 11:41:48 | vstinner | set | messages: + msg85197 |
| 2009年04月02日 09:12:05 | ocean-city | set | files: + py3k_mmap_bytes_cleanup_with_getarg_c.patch |
| 2009年04月02日 09:11:43 | ocean-city | set | files: - py3k_mmap_bytes_cleanup_with_getarg_c.patch |
| 2009年04月02日 09:07:06 | ocean-city | set | files:
+ py3k_mmap_bytes_cleanup_with_getarg_c.patch dependencies: + Py_BuildValue("c") should return bytes? messages: + msg85185 |
| 2009年04月02日 08:35:03 | vstinner | set | messages: + msg85183 |
| 2009年04月02日 00:41:42 | ocean-city | set | priority: release blocker |
| 2009年03月31日 19:23:27 | ocean-city | set | dependencies: + only accept byte for getarg('c') and unicode for getarg('C') |
| 2009年03月26日 22:35:19 | vstinner | set | messages: + msg84229 |
| 2009年03月17日 21:10:07 | loewis | set | messages: + msg83696 |
| 2009年03月17日 20:55:57 | vstinner | set | messages: + msg83695 |
| 2009年03月17日 19:41:47 | loewis | set | messages: + msg83693 |
| 2009年03月17日 17:39:11 | ocean-city | set | messages: + msg83688 |
| 2009年03月17日 17:08:27 | vstinner | set | messages: + msg83684 |
| 2009年03月17日 13:39:58 | ocean-city | set | messages: + msg83678 |
| 2009年03月17日 13:05:25 | vstinner | set | messages: + msg83677 |
| 2009年02月28日 21:18:10 | ocean-city | set | files:
+ py3k_mmap_and_bytes.patch keywords: + patch messages: + msg82947 |
| 2009年02月28日 14:48:58 | ocean-city | set | messages: + msg82912 |
| 2009年02月28日 14:39:42 | vstinner | set | nosy:
+ vstinner messages: + msg82910 |
| 2009年02月28日 14:07:07 | ocean-city | set | versions: - Python 3.0 |
| 2009年02月28日 11:06:51 | loewis | set | messages: + msg82905 |
| 2009年02月28日 11:05:53 | loewis | set | nosy:
+ loewis messages: + msg82904 |
| 2009年02月28日 11:02:27 | ocean-city | create | |