8.18
top
← prev up next →

Extracting binary data from bytestrings using matchπŸ”— i

This module introduces a new match pattern for matching and destructuring binary data encoded in a bytestring.

The API should be considered very alpha and open to incompatible changes.

Some similar packages include xenomorph and the "#lang" based binfmt.

1The binary match patternπŸ”— i

syntax

( binary byte-pattern...+maybe-rest)

byte-pattern = (bytes patlength)
| (zero-paddedpatlength)
| (until-bytepatbyte)
| (until-byte*patbyte)
| (length-prefixedpat)
| (length-prefixedpatprefix-lengthendianness)
| (number-typepat)
| (number-typepatendianness)
| control-pattern
maybe-rest =
| (rest*pat)
control-pattern = (get-offsetpat)
| (set-offset!offset)
number-type = s8
| u8
| s16
| u16
| s32
| u32
| u64
| s64
| f32
| f64
prefix-length = u8
| u16
| u32
| u64
endianness = big-endian
| little-endian
| native-endian
| host-order
| network-order
byte : byte?
length : (and/c fixnum? positive? )
offset : (and/c fixnum? (>=/c 0))
A match extender that, when matched against a bytestring, tries to destructure it according to the given spec and match extracted values against given match patterns.

An example:

(match#"17円240円bc"
((binary (s16numbig-endian)(bytes rest2))
(list numrest)));(4000#"bc")

bytes extracts a fixed-width field. zero-padded extracts a fixed-width field and strips trailing 0 bytes. until-byte extracts bytes until the given delimiter byte is encountered. until-byte* is the same but a failure to find the delimiter is not a match failure. length-prefixed reads a length header and then that many bytes. It defaults to the 9P protocol specification of a 2 byte little-endian length if not explicitly specified.

The number patterns should hopefully be self explanatory.

rest* takes any remaining bytes at the end of the bytestring after everything else is matched; if there are no extra bytes, it applies an empty bytestring to its pattern.

Normally, matching starts with the first byte in the bytestring. (set-offset!where) changes the location (To facilitate matching bytestrings with multiple records), and get-offset will save the current index at that point in the matching.

A more complex example, that matches an IPv4 header:

(matchheader
((binary
(u8(appbyte->nybbles version header-length))(u8service-type)(u16total-length)
(u16identification)(u16flags+fragment)
(u8ttl)(u8protocol)(u16checksum)
(bytes (appmake-ip-addresssource-address)4)
(u32(app(lambda (n)(make-ip-addressn4))dest-address))
(rest*options))
(list version header-lengthservice-typetotal-lengthttlprotocol
(ip-address->stringsource-address)(ip-address->stringdest-address)
options))))

2Additional functionsπŸ”— i

procedure

b:byte?
Splits a single byte into two 4-bit nybbles. The upper 4 bits is the first value, the lower 4 is the second.

(or/c 'big-endian'little-endian'native-endian)
endianness:(or/c 'big-endian'little-endian'native-endian'network-order'host-order)
= 'native-endian
A parameter that controls the endianness used by numeric patterns when one isn’t explicitly given.

top
← prev up next →

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /