lua-users home
lua-l archive

Re: string.pack with bit resolution

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great bil til once stated:
> One further offer to make:
 That you will make the necessary changes to string.pack()/string.unpack()
(or an entirely new module) for people to use and comment on? If the demand
is that high and you you know what it wanted, I would think you would be the
best person to implement this.
 Or is that just wishful thinking on my part?
> I would skip the signed bit numbers. Signed integers always have the
> slightly awkward property, that there is one negative number more than the
> positives,
 Lua is based upon C. And C allows the representation of negative integers
to be of sign magnitude, 1s complement, or 2s complement. The range of an
8-bit value for each of these are:
	sign magnitude:	-127 .. 127
	1s complement: -127 .. 127
	2s complement: -128 .. 127
 The C standard (and I'm using the C89 standard here, which Lua adheres to
most) only gives symetrical ranges for each integer. Section 5.2.4.2.1
states:
	Their implementation- delined values shall be equal or greater in
	magnitude (absolute value) to those shown. with the same sign.
	... 
	-- minimum value for an object of type int
	 INT_MIN	-32767
	-- maximum value for an object of type int
	 MIN_MAX	+32767
	...
 They can, of course, be bigger.
 Granted, most systems today are 2s complement, but I have recently come
across a C compiler, *still commerically avilable* for a sign magnitude
system. 
> and this "negative surplus" is this "wicked 0x80", which can even
> lead to crazy nightmares for experienced programmers. 
 I'm not aware of any crazy nightmares for experienced programmers. Novice
ones, yes. But perhaps I was fortunate enough to learn assembly first and
that on every 2s complement system I've learned assembly for (and that's
pretty much all the systems I've ever come across) state:
	NEG	set overflow flag if input is 80ドル (or 8000ドル or 80000000ドル
		depending upon size of operand).
 But in practice I've never had a real issue with this.
> In case of chars, this
> is only a 1% defect, but in case of a2, this is a 25% defect, which is
> really hard to explain to any user.
 And a 50% defect in the case of "a1". But even there, on page 150 of my
copy of K&R C (the C Bible, and remember, Lua is based upon C):
	struct {
		unsigned int is_keyword : 1;
		unsigned int is_extern : 1;
		unsigned int is_static : 1;
	} flags;
	This defines a variable called flags that contains three 1-bit
	fields. The number following the colon represents the field width
	in bits. The fields are declared unsigned int to ensure that they
	are unsigned quantities.
> So let's concentrate on the positive world and on the unsigned bit numbers
> A, A2 nd A4.
> 
> Labeling bits with large letter is of course a half nightmare again. So if
> you are flexible enough for this, I would use the sing "." to mark a bit.
> And if I have convinced you already enough concerning the importance of bits
> in such packings, you could please also allow the short cuts : and | for
> 2-bit and 4-bit unsigned.
> 
> So then 3 more lines in your format list:
> . a bit (value 0/nil/false, 1/true)
> .[n] an (unsigned) bit number with n bits (value 0/nil/false, 1/true ...
> 2^n-1)
> : an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3) 
> | an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3) 
> 
> If possible also the 2 following additonal float types:
> r short float
> D long double
> 
> (remark: long double is an ansi C standard type - this in ANY case needs to
> be somehow in the list ... the 64bit fans otherwise will kill you...)
> 
> As separators you have already allowed spaces, some people for sure want
> colons, I would propose also single quotes / hyphens '. So then the last
> line of format list should read: 
> " ", ",", "'": These 3 charakters (space, colon, single quote) are ignored
> 
> Then you could write nice formats to pack/unpack String into its bits like
> this (e. 1 Byte, 1short, 1 int):
> "....'...." or "||"
> "....'....'....'...." or "||'||"
> "||||'||||"
> (of course you should also allow "8." or "16." or "32." ... but the above
> notaions really look nice, even deigners would like this, I hope)
 At this point, the Erlang bit syntax is looking better and better. An
example:
	<<IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16,
	 ID:16, Flgs:3, FragOff:13,
	 TTL:8, Proto:8, HdrChksum:16,
	 SrcIP:32,
	 DestIP:32>>
 It will even allow you to specify things like signedness, endianess:
	X:6/little-signed-integer
(more about this here: http://erlang.org/doc/programming_examples/bit_syntax.html)
 Put that into a string, and your packing/unpacking module could even
return a table with the fields prenamed and everything.
> A further VERY nice application would appear, if you would allow to specify
> the n in the format list. You use the parameter n already for lua_Number
> (SIDE REMARK1: which makes sense - just please specify how you do this - I
 ... [ snip ] ...
 Or one could use a pre-existing module to serialize data. I wrote a very
extensive CBOR (Cocise Binary Object Representation, RFC-7049)
implementation for Lua:
	https://github.com/spc476/CBOR
and I hear CBOR is very popular among the IoT crowd for its compactness of
representation. And it's not like it's hard to use.
> assume you need _tt and then the native byte number (so in LUA_32BITS this
> would be 4 byte for int or 4 byte for float) 
 Assuming the platform in question uses 4-byte ints and IEEE-754 floating
point. Again, you are making assumptions that Lua does not.
> - you have to specify how long
> the _tt is - I assume 1 byte is fine for this, and maybe zero for float and
> 1 for integer and 2 for boolean or s - this of course really MUST be
> specified exactly in the descirption of pack / unpack. You could e. g. also
> use _tt marking 0 for boolean with 1 byte, 4 for int32, 5 for float32, 8 for
> int64, 9 for double64, maybe also FE for pointer32 and FF for pointer64").
> ... so n is given away already... then maybe use #).
> (SIDE REMARK2: The specifier j and J is stupid - this makes no sense ... if
> somebody wants an integer in this list, please i should be used)
 j and J give you at least a 64-bit int. i and I only give you the native
integer size, which can be 16-bit or larger.
> (SIDE REMARK3: In the format list for h and l you write "native size" - this
> is stupid in my eyes, please change to "2 bytes" for h, and "4 bytes" for l,
> or do you know some other native size for short and long??? - only
> lua_Integer and lua_Number has native size, as I see it)
 I have actually used the "native size" for a project recently, to read
native binary integers written by a C program. You might not find a use for
the "native sizes" but that doesn't mean others won't.
 
> So for the extension I want to describe in the following, please two further
> line in your format list:
 Have you thought of maybe implementing this yourself?
 ... [ snip ] ...
> (but please do not come with the argument, that these things bloat up the c
> code for string.pack / string.unpack very much - this I do not believe you
> ... these are just some minor additons in c code, which make these functions
> MUCH more flexible in use)
 Which means this should be easy to implement, right?
 -spc (Right?)

AltStyle によって変換されたページ (->オリジナル) /