I'm writing some serialization code, and I'm wondering how to deal with binary data. As I'm doing it in Python, my goal is to make it very simple, not require a lot of programmer overhead, etc.
Three options I am considering:
The fields that will be binary data are represented as hex-encoded strings. Thus you'd have something like:
obj = { 'foo': 100, 'bar': [1, 2, 3, 4], 'baz': "ab0123ffbbaa55", } ObjectSpec.loads(ObjectSpec(obj).dumps())
ObjectSpec
is the class which determines how to serialize the object.- The pros: it's easy to look at, easy to make object literals, easy to print out.
- The cons: you have to remember to hex-encode the fields. If you have bytes, you have to hex-encode them before the serialization code then hex-decodes them. If you want to store the objects, there's more overhead unless you hex-decode the strings first.
The fields are byte strings, instead, e.g.:
obj = { 'foo': 100, 'bar': [1, 2, 3, 4], 'baz': '\xab\x01#\xff\xbb\xaaU', }
- The pros: less overhead, both in space, and in not having to hex-encode if you already have bytes.
- The cons: harder to make literals, harder to print out. If you accidentally leave in a hex-encoded string then it will serialize the wrong thing (the hex representation instead of the thing itself).
The binary data fields use some custom type, e.g.
bson.Binary
:from bson import Binary obj = { 'foo': 100, 'bar': [1, 2, 3, 4], 'baz': Binary('\xab\x01#\xff\xbb\xaaU'), }
- The pros: Same as #2, but also clearly delineates binary types.
- The cons: Same as #2, except harder to accidentally encode the wrong thing. Requires wrapping the data in a type just to get the serialization code to accept it, instead of leaving bytes in.
What would the most sensible approach be? Is there another variant that is better?
1 Answer 1
You can define binary and hexadecimal values in python, just use 0xff
or 0b1001010101001
. They are defined as sub-class of int
. chr
and ord
function reads them very clearly.
object = {
"foo": 100
"bar": [1,2,3,4],
"baz": 0xffaff34441faabc # i realy dont get what foo,bar and baz are so i dont really know what your string should represent.
}
And then use binary operators and binary shift left, binary shift right to manage it into a right order. for example sending 13 byte data to someport with imaginary syntax PORT:STOP:MESSAGE : 0xFF00737461636B65786368616E6765
, or send "stackexchange" to port 255, where 0xff
is port, 0x00
is STOP, and the rest is message. So your function would be something like this, first 2 bytes are port, if next iz 0x00 then split and the rest "translate" to string.
If you want to test in safe enviorment you can use
map(chr, [0x73,0x74,0x61,0x63,0x6B,0x65,0x78,0x63,0x68,0x61,0x6E,0x67,0x65])
bytes
. You can create a byte array and wrapper that up inside a protobuf message if you need to preserve the existing format. However if you can modify the C++ code, then you could create new messages using the bindings (The bindings are auto-generatedclass
es using getters/setters for int/bool/std::string/etc. )