Shawn Wagner <shawnw.mobile@gmail.com>
This module introduces a new match pattern for matching and destructuring binary data encoded in a bytestring.
The API should be considered very alpha and open to incompatible changes.
Some similar packages include xenomorph and the "#lang" based binfmt.
syntax
( binary byte-pattern...+maybe-rest)
| (zero-paddedpatlength)| (until-bytepatbyte)| (until-byte*patbyte)| (length-prefixedpat)| (length-prefixedpatprefix-lengthendianness)| (number-typepat)| (number-typepatendianness)| control-patternmaybe-rest =| (rest*pat)control-pattern = (get-offsetpat)| (set-offset!offset)number-type = s8| u8| s16| u16| s32| u32| u64| s64| f32| f64prefix-length = u8| u16| u32| u64endianness = big-endian| little-endian| native-endian| host-order| network-order
An example:
(match#"17ε240εbc"
bytes extracts a fixed-width field. zero-padded extracts a fixed-width field and strips trailing 0 bytes. until-byte extracts bytes until the given delimiter byte is encountered. until-byte* is the same but a failure to find the delimiter is not a match failure. length-prefixed reads a length header and then that many bytes. It defaults to the 9P protocol specification of a 2 byte little-endian length if not explicitly specified.
The number patterns should hopefully be self explanatory.
rest* takes any remaining bytes at the end of the bytestring after everything else is matched; if there are no extra bytes, it applies an empty bytestring to its pattern.
Normally, matching starts with the first byte in the bytestring. (set-offset!where) changes the location (To facilitate matching bytestrings with multiple records), and get-offset will save the current index at that point in the matching.
A more complex example, that matches an IPv4 header:
(matchheader(u16identification)(u16flags+fragment)(u8ttl)(u8protocol)(u16checksum)(rest*options))(ip-address->stringsource-address)(ip-address->stringdest-address)options))))
procedure
b:byte?
parameter
= 'native-endian