This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004年09月06日 20:42 by josiahcarlson, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| structmodule_diff.txt | josiahcarlson, 2004年10月02日 22:34 | Diff to structmodule.c to add support. | ||
| long_and_bytes_conversion.diff | alexandre.vassalotti, 2009年08月11日 21:38 | Patch for Python 3.x | ||
| long_and_bytes_conversion-2.diff | alexandre.vassalotti, 2009年08月17日 20:13 | |||
| long_and_bytes_conversion-3.diff | alexandre.vassalotti, 2009年11月14日 20:38 | |||
| Messages (50) | |||
|---|---|---|---|
| msg54238 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月06日 20:42 | |
I believe there should be a mechanism to load and
unload arbitrarily large integers via the struct
module. Currently, one would likely start with the 'Q'
format character, creating the integer in a block-wise
fashion with multiplies and shifts.
This is OK, though it tends to lend itself to certain
kinds of bugs.
There is currently another method for getting large
integers from strings and going back without the struct
module:
long(stri.encode('hex'), 16)
hex(inte)[2:].decode('hex')
Arguably, such things shouldn't be done for the packing
and unpacking of binary data in general (the string
slicing especially).
I propose a new format character for the struct module,
specifically because the struct module is to "Interpret
strings as packed binary data". Perhaps 'g' and 'G'
(eg. biGint) is sufficient, though any reasonable
character should suffice. Endianness should be
handled, and the number of bytes representing the
object would be the same as with the 's' formatting
code. That is, '>60G' would be an unsigned big-endian
integer represented by 60 bytes (null filled if the
magnitude of the passed integer is not large enough).
The only reason why one wouldn't want this
functionality in the struct module is "This module
performs conversions between Python values and C
structs represented as Python strings." and arbitrarily
large integers are not traditionally part of a C struct
(though I am sure many of us have implemented arbitrary
precision integers with structs). The reason "not a C
type" has been used to quash the 'bit' and 'nibble'
format character, because "masks and shifts" are able
to emulate them, and though "masks and shifts" could
also be used here, I have heard myself and others state
that there should be an easy method for converting
between large longs and strings.
A side-effect for allowing arbitrarily large integers
to be represented in this fashion is that its
functionality could, if desired, subsume the other
integer type characters, as well as fill in the gaps
for nonstandard size integers (3, 5, 6, 7 etc. byte
integers), that I (and I am sure others) have used in
various applications.
Currently no implementation exists, and I don't have
time to do one now. Having taken a look at
longobject.c and structmodule.c, I would likely be able
to make a patch to the documentation, structmodule.c,
and test_struct.py around mid October, if this
functionality is desireable to others and accepted.
While I doubt that a PEP for this is required, if
necessary I would write one up with a sample
implementation around mid October.
|
|||
| msg54239 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2004年09月06日 22:34 | |
Logged In: YES user_id=80475 FWIW, I'm working str/long conversion functions for the binascii module. Will that suit your needs? The tolong function is equivalent to: long(hexlify(b), 16) |
|||
| msg54240 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月06日 23:44 | |
Logged In: YES
user_id=341410
As I provide in the feature request, there is already a
method for translating string <-> long.
The problem with current methods for converting between
large integers and strings is that they do not lend
themselves to generally being understandable or to being
documented.
The struct module already provides two appropriate functions
for handling packed binary data, a place for documenting
functions involving packing and unpacking binary data, and
whose implementation seems to be simple enough (one more
format character, much of which borrowed from 's' character,
and a call to _PyLong_FromByteArray seems to be sufficient).
As for the binascii module, many of the functions listed
seem like they should be wrapped into the encode/decode
string methods, hexlify already being so in str.encode('hex').
To me, just being able to translate doesn't seem sufficient
(we already can translate), but being able to do it well,
have it documented well, and placed in a location that is
obvious, fast and optimized for these kinds of things seems
to be the right thing.
From what I can tell, the only reason why struct doesn't
already have an equivalent format character to the proposed
'g' and 'G', is because the module was created to handle
packed C structs and seemingly "nothing else". Considering
there doesn't seem to be any other reasonable or easily
documentable location for placing equivalent functionality
(both packing and unpacking), I am of the opinion that
restricting the packing and unpacking to C types in the
struct module (when there are other useful types) is overkill.
As I said, I will provide an implementation if desired.
|
|||
| msg54241 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2004年09月07日 00:02 | |
Logged In: YES user_id=80475 Okay, submit a patch with docs and unittests. |
|||
| msg54242 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2004年09月08日 19:15 | |
Logged In: YES user_id=21627 Apparently, the point of this request is that the method for converting long ints to binary should be "easily found in documentation". And also apparently, the submitter thinks that the struct module would be the place where people look. Now, that allows for a simple solution: document the approach of going through hex inside the documentation of the struct module. There is one other reason (beyond being primarily for C APIs) why such a feature should *not* be in the struct module: The struct module, most naturally, is about structures. However, I understand that the intended usage of this feature would not be structures, but single long values. Therefore, I consider it counter-intuitive to extend struct for that. |
|||
| msg54243 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2004年09月08日 19:30 | |
Logged In: YES user_id=80475 The idea is to expose the _PyLong_FromByteArray() and _PyLong_AsByteArray() functions. While long(hexlify(b),16) is doable for bin2long, going the other way is not so simple. I agree that these are not struct related. Originally, I proposed the binascii module because one of the operations is so similar to hexlify(). As there other suggestions? |
|||
| msg54244 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2004年09月08日 21:45 | |
Logged In: YES user_id=21627 I would think that def long_as_bytes(lvalue, width): fmt = '%%.%dx' % (2*width) return unhexlify(fmt % (lvalue & ((1L<<8*width)-1))) is short enough for a recipe to not really make a C function necessary for that feature. However, if they are going to be provided somewhere, I would suggest that static methods on the long type might be the right place. |
|||
| msg54245 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月08日 22:03 | |
Logged In: YES user_id=341410 Structures (aka C Structs) can contain arbitrarily large or small numbers of basic types inside them. As such, 'single long values' are still a valid use. I use struct for packing and unpacking of single items (8,4,2 byte integers, 1 byte integers are faster served via chr and ord) when necessary (because it is the most convenient), as well as a current contract where it is not uncommon to be packing and unpacking 256 byte structs. Those large structs contains various 1,2,4 and 8 byte integers, as well as a handful of 16 and 20 byte integers (which I must manually shift and mask during packing and unpacking). I'm a big boy, and can do it, but that doesn't mean that such functionality should be left out of Python. As for 'document the approach of going through hex inside the documentation of the struct module', I am curious about whether other modules do the same thing, that is to tell users "this functionality conceptually fits here X%, which is why it is documented here, but because it does not fit 100%, here is how you can do the same thing, which will likely look like a strange hack, require slicing potentially large strings, and be significantly slower than if we had just added the functionality, but here you go anyways." Now, I don't /need/ the feature, but I believe myself and others would find it useful. I also don't /require/ it be in struct, but no other modules offer equivalent functionality; Pickle and Marshal are Python-only, binascii (and bin2hex) are for converting between binary and ascii representations for transferring over non-8-bit channels (email, web, etc.), and no other module even comes close to offering a similar bit of "packs various types into a binary format, the same way C would" as struct. If anyone has a better place for it, I'm all ears (or eyes). |
|||
| msg54246 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月08日 22:38 | |
Logged In: YES user_id=341410 Martin, I was typing as you submitted your most recent comment. I am honestly shocked that you would suggest that longs should gain a method for encoding themselves as binary strings. Such a thing would then suggest that standard ints and floats also gain such methods. It would also imply that since one can go to strings, one should equivalently be able to come from strings via equivalent methods. Goodness, int.tostring(width) and int.fromstring(str)? But what about endianness? Looks like a big can of worms to me. |
|||
| msg54247 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2004年09月08日 22:53 | |
Logged In: YES user_id=21627 Since you were asking: it is quite common that modules refer to related functionality. For example, BaseHTTPServer refers to SimpleHTTPServer and CGIHTTPServer. One might expect that a HTTP server also supports files and does CGI - but not this one; go elsewhere. Likewise, module binascii refers to modules uu and binhex. The math documentation points out that it does not support complex numbers, and that cmath is needed. The audioop documentation gives the function echocancel in the documentation, instead of implementing it. |
|||
| msg54248 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月10日 23:18 | |
Logged In: YES user_id=341410 (sorry it took me a few days to get back to you, I am on a contract deadline crunch...just taking a break now) The *HTTPServer heirarchy is interesting in its own right, but really, each piece in the heirarchy adds functionality. A similar thing can be said of asyncore and all the modules that derive from it (asynchat, *HTTPServer, *XMLRPCServer, smtpd, etc.). In this case, since the struct module is already in C and the functions are not subclassable, creating another module that parses strings and sends pieces off to struct for actual decoding seems like a waste of a module, especially when the change is so minor. Now, binascii is being used in such a fashion by uu and binhex, but that is because binascii is the data processing component, where uu and binhex make a 'pretty' interface. Struct doesn't need a pretty interface, it is already pretty. Though as I have said before, I think it could use this small addition. |
|||
| msg54249 - (view) | Author: Tim Peters (tim.peters) * (Python committer) | Date: 2004年09月12日 03:40 | |
Logged In: YES user_id=31435 binascii makes sense because that's where the hexlify and unhexlify functions live, which are small conceptual steps away from what's needed here. Methods on numbers make sense too, and only seem strange because so few are clearly visible now (although, e.g., there are lots of them already, like number.__abs__ and number.__add__). The struct module makes sense too, although it would be darned ugly to document a refusal to accept the new codes in "native" mode; and struct has a high learning curve; and struct obviously never intended to support types that aren't supplied directly by C compilers (the "Pascal string" code seems weird now, but not at the time). |
|||
| msg54250 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月12日 18:55 | |
Logged In: YES user_id=341410 Hexlify and unhexlify make sense for translating strings. Nowhere in the binascii module is there any mention of translating anything to/from integers, longs or floats. The only reason (un)hexlify make any sense at all is because we can get integers from hexlified strings, and get strings from hexlified integers with relative ease. I guess the trick with struct, at least with me, is not that I use it because it translates to/from C types, it is because it translates to/from types that I find useful. Its intersection with C types, as well as Python's intersection with C types, and my use intersection with the types is a convenient (but very engineered and understandable) coincidence. It would be another very convenient (but also engineered *wink*) coincidence if I didn't have to first extract a section of data, then translate it, in order to get one large integer. In the cases that I would truely find useful, big integers are a part of what would be called structs in the C world, and wouldn't require additional processing over what I'm already doing for other integers and floats. I was looking around, and it turns out that in 2001, Paul Rubin requested that one be able to translate to/from arbitrary bases via the C-level format function. In that discussion, Paul made the case that there should be a method to get arbitrarily long integers to and from strings: "The struct module doesn't give any way of converting arbitrary ints (meaning longs) to binary. Really, it's needed. Some people do it with gmpy, but if Python is going to support longs as a built-in type, one shouldn't have to resort to 3rd-party modules to read and write them in binary." Guido followed up with: "OK, I believe you. Can you submit a patch?" It seems like this was in reference to being able to use functions in binascii for converting to/from arbitrary packed binary integer types in base 256 (http://sourceforge.net/tracker/?func=detail&atid=105470&aid=465045&group_id=5470 if you are interested). That request seems to have died because Paul dropped the ball. Me, I would prefer struct to binascii, if only because the code for doing this is already waiting to be used in struct, and because you can pull multiple objects from a single packed binary string, rather than one object per call. This would seemingly also satisfy complaints of being able to translate to/from base 256 for arbitrarily large integers. |
|||
| msg54251 - (view) | Author: Michael Hudson (mwh) (Python committer) | Date: 2004年09月15日 17:02 | |
Logged In: YES user_id=6656 Josiah, what do you suggest struct.calcsize does with the format code your proposing? I think this question encapsulates why I find this feature request a bit misdirected. |
|||
| msg54252 - (view) | Author: Michael Hudson (mwh) (Python committer) | Date: 2004年09月15日 17:03 | |
Logged In: YES user_id=6656 Oops, I see you actually address that (ish). But I still feel packing what is an essentially variable length type using the struct module is a bit strange. |
|||
| msg54253 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月15日 18:53 | |
Logged In: YES
user_id=341410
As you state, it already supports packing and unpacking a
variable-lengthed type: strings.
In the use cases I've had and seen for (un)packing strings
with struct, it is the most common to define a static format
code, and use that all the time. That is, you see things
like ">HLLHB25s", that become string constants in a module.
On the _very rare_ occasion where people want more
flexibility in their types, I have seen both the use of
fixed and variable pascal strings...
def packit(arg1, arg2, arg3, strng):
return struct.pack(">LHH%ip"%len(strng), arg1, arg2,
arg3, strng)
I would not expect any pascal-string-like packing of a large
integer, though it is possible. I do expect that most
people have similar use cases as I, and would pre-define
their struct formatting code. In the case of other similar
requests (long to string, string to long via a base256
representation, etc.) for use in cryptography, I expect that
the regularity of structures used in cryptography would
almost certainly result in formatting codes being module
constants.
To sum up, both in the case for the 's' and 'p' format
codes, and the proposed 'g'/'G' formatting codes, the vast
majority of use cases pre-define the length of the string
and large integer on a per-structure basis via "25s", "25p",
or "25g". Rarely are the lengths truely variable in the
case of "%ip"%len(strng).
|
|||
| msg54254 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2004年09月15日 22:08 | |
Logged In: YES user_id=80475 My vote is for binascii functions to parallel hexlify and unhexlify. Ideally, it would give the same result as long(hexlify(s), 16). |
|||
| msg54255 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月16日 00:17 | |
Logged In: YES user_id=341410 Raymond, from your first post on this topic, it seems as though you were previously implementing this functionality in binascii for some particular reason, and it seems as though it is to be included with binascii in the future, regardless of the outcome of this particular feature request. The only reason the binascii solution is better than status quo, is because a user doesn't need to implement arbitrarily large integer packing and unpacking themselves. On the other hand, it still requires the user make manual binascii.str_to_long(str_obj) calls in the case of it being part of a struct, so doesn't gain significantly. Now, one of the reasons why I requested a format code addition was because one can (un)pack multiple data types simultaneously with a single function call via struct. In nearly all of the use cases I have for packing and unpacking large integers, they are a part of other structures. In the cases where I have been packing and unpacking single integers, floats, etc., I still use struct because it is has nearly all of the functionality I need (signed, unsigned, big endian, little endian, char, short, long, long long, etc., lacking only arbitrarily large integer packing and unpacking). |
|||
| msg54256 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2004年09月16日 00:26 | |
Logged In: YES user_id=80475 I agree with Michael and Martin that variable length types do not belong in struct. The module is about working with fixed record layouts. |
|||
| msg54257 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月16日 02:31 | |
Logged In: YES
user_id=341410
And as I stated, the 's' format character is also a
"variable lengthed type".
It just so happens that in most use cases I've had and
observed for both the 's' format AND proposed 'g' format,
the type size, is in fact, fixed at 'compile' time. It also
happens that for the 'g' format, this fixed size is not in
the set {1,2,4,8}, which are not limitations for the
pre-existing 's' format.
Please note that the only fundamental difference between the
pre-existing 's' format and the proposed 'g' format, is that
of a quick call to appropriate PyLong_* functions, and a
range check as required by other integer types.
Python is a tool. Struct is a tool. By changing the tool
only slightly, we can add flexibility. The code is already
there, minor glue would make it work, and would make it
convenient for, I believe, more people than binascii.
|
|||
| msg54258 - (view) | Author: Tim Peters (tim.peters) * (Python committer) | Date: 2004年09月16日 02:52 | |
Logged In: YES user_id=31435 Use cases are important. Oddly(?) enough, I've never had a need for a bigint conversion in a C struct, and have a hard time imagining I will someday. All the cases I've had (and I've had more than a few) were one-shot str->long or long->str conversions. An obvious example in the core is the tedious encode_long() and decode_long() functions in pickle.py. Note that a pickle.encode_long() workalike doesn't know in advance how many bytes it needs, which would make using struct a PITA for that particular use case. If a proposal isn't convenient for taking over existing conversions of this nature, that counts against it. |
|||
| msg54259 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年09月16日 18:11 | |
Logged In: YES user_id=341410 Curse you Tim, for your core Python experience *wink*. Pickle is one example where a pascal-like encoding of longs was an encoding decision made to be flexible and space efficient. Certainly we have disparate use cases. Mine is for fixed struct-like records with multiple types. With pickle, any /thought/ of fixed records are tossed out the window with variable-lengthed types like strings, longs, lists, tuples and dicts, and I believe aren't really comparable. Now, variable-lengthed longs packed in little-endian format already have a mechanism for encoding and decoding via pickle.en/decode_long (though it is wholly undocumented), and seemingly is going to get another in binascii. Fixed-lengthed, optional signed/unsigned, optional little-endian/big-endian longs do not have a mechanism for encoding and decoding, which is what I am asking for. I will point out that 128 bit integers are gaining support on newer 32 and 64 bit processors and C compilers for them (SSE on x86, Itanium, etc.). In the future, a new code for these 128 bit integers may be asked for inclusion. With a variable-width integer type, all future "hey, we now have x-byte types in C, where is struct support in Python?", can be answered with the proposed, "choose your integer size" format code. That is to say, this format code is future proof, unless integer types start wandering from integer byte widths. |
|||
| msg54260 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2004年10月02日 22:34 | |
Logged In: YES
user_id=341410
I have just attached a unified diff against structmodule.c
2.62 in CVS.
It implements the semantics I have been describing, compiles
cleanly, and produces proper results.
>>> pickle.encode_long(83726)
'\x0eG\x01'
>>> struct.pack('<3g', 83726)
'\x0eG\x01'
>>> struct.unpack('<3g', struct.pack('<3g', 83726))
(83726L,)
If the functionality is accepted, I will submit diffs for
test_struct.py and libstruct.tex .
|
|||
| msg54261 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2004年10月02日 23:00 | |
Logged In: YES user_id=80475 If no one other that the OP supports this, I would like to reject putting this in the struct module. Initially, it seemed like a fit because the endian options and whatnot are already in place; however, in one way or another each of the posters except the OP has stated good reasons for it not being in the struct module. Variable length C structure members are not what the module is about. Having to know the length in advance of the call is a killer. The learning curve issues with struct are also a problem. And, the use cases jsut don't point to struct. Put in a simple function in binascii or let's drop it. |
|||
| msg54262 - (view) | Author: Bob Ippolito (bob.ippolito) * (Python committer) | Date: 2004年10月06日 01:59 | |
Logged In: YES
user_id=139309
I would definitely have an immediate use for 3 byte integers.. the Mach-
O executable format has a couple fields that are 3 byte unsigned
integers (bit flags). py2app's supporting library macholib reads and
writes this format directly. Currently I have several places that look
like this:
class dylib_reference(Structure):
_fields_ = (
# XXX - ick, fix
('isym_flags', p_ulong),
#('isym', p_ubyte * 3),
#('flags', p_ubyte),
)
|
|||
| msg60065 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2008年01月17日 21:40 | |
FWIW, an use case of this I have encountered is to generate a string of random bytes from the long object returned by random.getrandbits(). |
|||
| msg60135 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2008年01月19日 03:54 | |
See also issue #923643. |
|||
| msg67955 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2008年06月11日 12:29 | |
Don't think there is sufficient agreement on this one to move forward. It looks like OP has a completely different conception of the struct module than the other respondants. For the time being, pickle.dumps with protocol 2 can serve as a way to save arrays of long integers in binary. |
|||
| msg67983 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2008年06月11日 14:47 | |
This isn't about packing arrays of long integers in an array. I know the discussion is old, and I know the discussion is long, and honestly, I don't really need this particular functionality anymore (in the struct module in particular), but I still believe that being able to pack and unpack arbitrarily lengthed integers is useful. What is interesting is that this functionality was supposed to be in binascii years ago (which I resolved to myself as being sufficient), yet currently is not. |
|||
| msg69285 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2008年07月05日 18:34 | |
Pickle is a solution only if you accept the target format to be opaque,
which is not what you are looking for usually.
Once again I just had to write the cumbersome:
junk_len = 1024
junk = (("%%0%dX" % junk_len) % random.getrandbits(junk_len *
8)).decode("hex")
... because there is no obvious way to convert longs to bytes.
|
|||
| msg91487 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2009年08月11日 21:38 | |
I went ahead and coded a new API for converting long integers to byte arrays and vice-versa. My patch adds two new methods to the long type: .as_bytes() and .frombytes(). The patch itself is well-documented; but nevertheless, here's some examples: >>> (1024).as_bytes() b'\x04\x00' >>> (1024).as_bytes(fixed_length=10) b'\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00' >>> (-1024).as_bytes(fixed_length=10) b'\xff\xff\xff\xff\xff\xff\xff\xff\xfc\x00' >>> (-1024).as_bytes(little_endian=True) b'\x00\xfc' >>> ((2**16)-1).as_bytes(fixed_length=2, signed=False) b'\xff\xff' >>> int.frombytes(b'\x00\x10') 16 >>> int.frombytes(b'\x00\x10', little_endian=True) 4096 >>> int.frombytes(b'\xfc\x00') -1024 >>> int.frombytes(b'\xfc\x00', signed=False) 64512 This patch depends on another patch posted in issue #6687. So, apply the other patch before testing this one. |
|||
| msg91602 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2009年08月15日 11:18 | |
Thanks for this patch, Alexandre! I'm +1 on applying a version of this patch. I'm not convinced that the variable-length part (i.e., fixed_length=None) of int.as_bytes is all that useful; the choices that need to be made about how to represent integers seem too arbitrary to standardize in this function. In effect, the non-fixed-length version provides yet another serialization mechanism for integers, and there's no shortage of existing mechanisms. As I see it, the purpose of the as_bytes and frombytes methods is lower-level: providing a basic operation that will be used by various serialization methods. So I'd suggest making fixed_length a required argument; code requiring non-fixed-length conversions can use int.bit_length to help calculate the length they want. <bikeshedding> I'm also not convinced by the defaults for the other two arguments: personally, I'd expect to need unsigned more often than signed, and little-endian more often than big- endian. Perhaps the byteorder should default to the native byteorder when not explicitly given? That would bring the conversions more closely in line with the struct module. Another possibility: instead of 'little_endian', have a parameter 'byteorder' taking the value 'big' or 'little'; this would enable use of byteorder=sys.byteorder to explicitly specify native byteorder, and avoids bias towards one particular byte order. Can we use 'length' instead of 'fixed_length'? </bikeshedding> There's a typo in the test_long part of the patch: aserrtRaises -> assertRaises; apart from that, all tests pass on OS X 10.5/Intel with this patch applied. I'm in the process of looking at the code more thoroughly. See related Python-ideas thread at: http://mail.python.org/pipermail/python-ideas/2009-August/005489.html |
|||
| msg91603 - (view) | Author: Eric Eisner (ede) | Date: 2009年08月15日 11:34 | |
Is there some pre-existing naming convention of as_X and fromX? It seems strange that two related functions would have a different use of underscores. |
|||
| msg91604 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年08月15日 11:49 | |
I agree with the comments which were made on the following points: - please use consistent naming (`as_bytes` / `from_bytes`, or `asbytes` / `frombytes`; my preference goes to the former, especially now that we have `bit_length`) - default byteorder should be native, certainly not big endian which is a small minority amongst today's computers - you should synchronize with the python-ideas discussion, so that the final API gets validated more publicly; it would not be very pleasant for a patch to be committed if discussion were still in flux, and perhaps with different conclusions as to the API which should be adopted Besides, your patch has indentation problems (mixed spaces and tabs). |
|||
| msg91616 - (view) | Author: Josiah Carlson (josiahcarlson) * (Python triager) | Date: 2009年08月15日 17:54 | |
I'm not a big fan of the names, but as long as the functionality exists, people can easily alias them as necessary. I've not actually looked at the patch, but as long as it does what it says it does, it looks good. My only question, does it makes sense to backport this to trunk so we get this in 2.7? I personally would like to see it there, and would even be ok with a limitation that it only exists as part of longs. If someone has the time to backport it; cool. If not, I understand, and could live with it only being in 3.x . Thank you for taking the time and making the effort in getting this into a recent Python :) |
|||
| msg91669 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2009年08月17日 20:13 | |
Here's a new patch incorporating the suggestions I received on python-ideas. Notable changes are: - The name of the methods have been changed to int.tobytes() and int.frombytes(). - The tri-state `little_endian' argument has been removed in favor of the `byteorder' argument which takes either the string 'little' or 'big'. - The `byteorder' argument has to be specified explicitly. - The variable-length version of int.tobytes() has been removed. - The `fixed_length' argument has been renamed to `length'. - The `signed' argument is now keyword-only and now defaults to False. |
|||
| msg91672 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年08月17日 20:46 | |
Before this gets applied, a (preferably final) decision should be made whether it should be provided for 2.7 as well. Personally, it would be fine with me either way. |
|||
| msg91673 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年08月17日 21:02 | |
Alexandre: > Notable changes are: > > - The name of the methods have been changed to int.tobytes() and > int.frombytes(). Without wanting to bikeshed, I think these methods should take underscores as other int methods already do. This kind of inconsistencies is really annoying (have you ever used PHP? :-)). Martin: > Before this gets applied, a (preferably final) decision should be made > whether it should be provided for 2.7 as well. Personally, it would be > fine with me either way. I'm also fine with adding it to 2.7 as well. But someone has to provide a patch (2.7 still has both `int` and `long`, which will make the task a bit more involved than a straight backport). |
|||
| msg92069 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2009年08月29日 20:46 | |
The patch looks great! Some comments: - I think the type check for length_obj in long_tobytes should be more lenient: I'd suggest a PyIndex_Check and PyNumber_AsSsize_t conversion instead of the PyLong_Check. Or just use 'n' instead of 'O' in the PyArg_Parse* format; this uses PyNumber_Index + PyLong_AsSsize_t, which amounts to the same thing (or at least I *think* it does). - I like the pickle changes, but I think they should be committed separately. (Unless they're somehow required for the rest of the patch to function correctly?) - Stylistic nit: There's some inconsistency in the NULL checks in the patch: "if (args != NULL)" versus "if (is_signed_obj)". PEP 7 doesn't say anything about this, but the prevailing style in this file is for an explicit '== NULL' or '!= NULL'. - I'm getting one failing test: test.support.TestFailed: Traceback (most recent call last): File "Lib/test/test_long.py", line 1285, in test_frombytes self.assertRaises(TypeError, int.frombytes, "", 'big') AssertionError: TypeError not raised by frombytes This obviously has to do with issue 6687; as mentioned in that issue, I'm not sure that this should be an error. How about just removing this test for now, pending a decision on that issue? - Nice docs (and docstrings)! On the subject of backporting to 2.7, I haven't seen any objections, so I'd say we should go for it. (One argument for not backporting new features is to provide incentive for people to upgrade, but I can't realistically see this addition as a significant 'carrot'.) I'm happy to do the backport, and equally happy to leave it to Alexandre if he's interested. Leaving the bikeshedding until last: Method names: I'm +0 on adding the extra underscore. Python's already a bit inconsistent (e.g., float.fromhex and float.hex). Also, at the time the float.fromhex and float.hex names were being discussed, 'hex' seemed to be preferred over 'tohex'; I wonder whether we should have int.bytes instead of int.to_bytes? |
|||
| msg95258 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2009年11月14日 20:38 | |
Here's an updated patch. - Renamed tobytes() to to_bytes() and frombytes() to from_bytes(). - Moved the changes to pickle to a different patch. - Made the NULL-checks more consistent with the rest of long's code. - Fixed the type check of the `length' parameter of to_bytes() to use PyIndex_Check() instead of PyLong_Check(). |
|||
| msg95730 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年11月25日 23:58 | |
The following example is strange: + >>> int.from_bytes([255, 0, 0], byteorder='big') + -65536 Isn't `signed` supposed to be False by default? The rest looks ok. |
|||
| msg95734 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2009年11月26日 09:38 | |
Looks good to me, too. |
|||
| msg95735 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2009年11月26日 09:41 | |
All tests pass on OS X 10.5/Intel, except that I'm still getting the issue 6687 test failure. This needs to be resolved somehow before committing. |
|||
| msg97469 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2010年01月09日 20:36 | |
Committed in r77394. Thank you for the good reviews! |
|||
| msg97588 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2010年01月11日 14:48 | |
I'd still like to see this backported to 2.7; if no-one else is interested in doing the backport, I'll try to find time before the betas. (Alexandre, if you want to do the backport, please do steal the issue from me.) |
|||
| msg99314 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2010年02月13日 12:17 | |
A couple of questions for the backport: (1) should the 'signed' parameter remain keyword-only in 2.7? I'd say yes, to avoid issues when forward-porting code from 2.7 to 3.2. On the other hand, 2.7 doesn't support keyword-only arguments at the Python level, so a keyword-only argument might be a bit of a surprise to users. (2) When specifying the byteorder, is there a need to allow u'big' and u'little' as well as 'big' and 'little'? |
|||
| msg99321 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2010年02月13日 17:27 | |
Mark Dickinson added the comment: > (1) should the 'signed' parameter remain keyword-only in 2.7? We should keep it as a keyword-only argument. Also, issue #1745 might bring keyword-only arguments to 2.7. > (2) When specifying the byteorder, is there a need to allow u'big' and > u'little' as well as 'big' and 'little'? Allow both. Since, 'big' == u'big' is True, it would be weird to treat them differently in this case. Plus, it will make life easier for people who uses: from __future__ import unicode_literals in their code. |
|||
| msg99329 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2010年02月13日 18:44 | |
Thanks, Alexandre! Agreed on both points. I don't really want to allow u'big' and u'little', but I think that's just my laziness talking. (Apart from that, I have a working patch.)
There's some precedent for not allowing the unicode versions:
>>> float.__getformat__("double")
'IEEE, little-endian'
>>> float.__getformat__(u"double")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __getformat__() argument must be string, not unicode
But I admit it isn't particularly compelling.
|
|||
| msg102157 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2010年04月02日 11:20 | |
The backport wasn't as straightforward as I'd hoped, and we've pretty much run out of time for 2.7. One issue is that long.from_bytes(b, ...) converts b to bytes type using the equivalent of "bytes(b)". This doesn't work well in 2.7 (consider "bytes([255, 0, 0])" for example. So different code is needed in 2.7 when interpreting an arbitrary Python object as a sequence of bytes. Perhaps the 2.7 version could just iterate over the given object, and raise an exception if any of the iterates are not integers in the range [0, 256). |
|||
| msg102241 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2010年04月03日 11:44 | |
Closing this; it's too late for Python 2.7. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:06 | admin | set | github: 40877 |
| 2010年04月03日 11:44:17 | mark.dickinson | set | status: open -> closed messages: + msg102241 versions: - Python 2.7 |
| 2010年04月02日 11:20:30 | mark.dickinson | set | messages: + msg102157 |
| 2010年02月13日 18:44:34 | mark.dickinson | set | messages: + msg99329 |
| 2010年02月13日 17:27:29 | alexandre.vassalotti | set | messages: + msg99321 |
| 2010年02月13日 12:17:39 | mark.dickinson | set | priority: low -> high messages: + msg99314 |
| 2010年01月11日 21:17:05 | brian.curtin | set | nosy:
+ brian.curtin |
| 2010年01月11日 14:48:04 | mark.dickinson | set | status: closed -> open assignee: alexandre.vassalotti -> mark.dickinson messages: + msg97588 versions: + Python 2.7 |
| 2010年01月09日 20:36:36 | alexandre.vassalotti | set | status: open -> closed resolution: accepted messages: + msg97469 stage: patch review -> resolved |
| 2009年11月26日 09:41:13 | mark.dickinson | set | messages: + msg95735 |
| 2009年11月26日 09:38:22 | mark.dickinson | set | messages: + msg95734 |
| 2009年11月25日 23:58:22 | pitrou | set | messages: + msg95730 |
| 2009年11月14日 20:38:10 | alexandre.vassalotti | set | files:
+ long_and_bytes_conversion-3.diff dependencies: + Move the special-case for integer objects out of PyBytes_FromObject. messages: + msg95258 |
| 2009年08月29日 20:46:42 | mark.dickinson | set | messages: + msg92069 |
| 2009年08月17日 21:02:04 | pitrou | set | messages: + msg91673 |
| 2009年08月17日 20:46:51 | loewis | set | messages: + msg91672 |
| 2009年08月17日 20:13:10 | alexandre.vassalotti | set | files:
+ long_and_bytes_conversion-2.diff assignee: alexandre.vassalotti components: + Interpreter Core title: proposed struct module format code addition -> Conversion of longs to bytes and vice-versa. nosy: mwh, tim.peters, loewis, rhettinger, josiahcarlson, bob.ippolito, mark.dickinson, pitrou, alexandre.vassalotti, ede versions: + Python 3.2 messages: + msg91669 stage: patch review |
| 2009年08月15日 17:54:58 | josiahcarlson | set | messages: + msg91616 |
| 2009年08月15日 11:49:13 | pitrou | set | messages: + msg91604 |
| 2009年08月15日 11:34:11 | ede | set | nosy:
+ ede messages: + msg91603 |
| 2009年08月15日 11:18:36 | mark.dickinson | set | messages: + msg91602 |
| 2009年08月11日 21:38:09 | alexandre.vassalotti | set | files:
+ long_and_bytes_conversion.diff nosy: + alexandre.vassalotti messages: + msg91487 |
| 2008年07月05日 18:34:55 | pitrou | set | messages: + msg69285 |
| 2008年06月11日 14:47:14 | josiahcarlson | set | messages: + msg67983 |
| 2008年06月11日 12:29:34 | rhettinger | set | priority: normal -> low assignee: rhettinger -> (no value) messages: + msg67955 |
| 2008年01月29日 01:19:07 | mark.dickinson | link | issue923643 superseder |
| 2008年01月19日 03:54:56 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg60135 |
| 2008年01月17日 21:40:34 | pitrou | set | nosy:
+ pitrou messages: + msg60065 |
| 2007年09月20日 05:01:12 | brett.cannon | set | keywords: + patch |
| 2004年09月06日 20:42:10 | josiahcarlson | create | |