John Clements <clements@racket-lang.org>
This is a Racket library to read and write EBML, the "extended binary markup language." It was designed by the implementors of the “matroska” project.
EBML is a packed format for representing structured data.
element = header data
header = encoded-id encoded-len
data = element ...
| non-element
The encoded-id and encoded-len fields use a variable-length encoding. The first byte of this encoding indicates—through the number of leading zeros—how many bytes long the field is. the leading zeros and the first one are then trimmed from the given number of bytes, and the result is interpreted as a big-endian unsigned number.
So, for instance, the byte string
(bytes#b10001011)
... encodes the length 11, while the byte string
(bytes#b01000001#b00000010)
... encodes the length 258. Note that there is more than one way to encode a given length; you could also encode the length 11 with the byte string
(bytes#b00100000#b00000000#b00001011)
Header IDs are limited to four bytes, and data lengths are limited to 8.
EBML has at least one big problem, which is that the packed representation is ambiguous. Specifically, there’s no way to reliably distinguish data that is a sequence of elements from data that is a binary blob.
To choose a concrete example, if you were using EBML to encode s-expressions, you might choose a particular header id (say, 1) to encode a pair, and another one (say, 2) to encode atoms (let’s use 3 to indicate null, just to simplify). The pairs would include two sub-elements, and the atoms would contain, say string or integer. When decoding an encoded stream, the reader needs to have a priori knowledge that the header id 1 contains sub-elements, and the headers 2 and 3 do not.
The reader and writer both use this representation of ebml-elements:
(defineebml-element?(flat-murec-contract([ebml-element?(list/cexact-nonnegative-integer?(or/cbytes?(listofebml-element?)))])ebml-element?))
;ping-pongrequiredtoknowwhentorecur:(define(read-ebml-sexpsbytes)(mapexpand-ebml-sexp(ebml-readbytes)));recuronelementsthatarecontainers:(define(expand-ebml-sexpelement)(matchelement[(list1exps-bytes)(applycons(read-ebml-sexpsexps-bytes))][(list2atom-bytes)(string->symbol(bytes->string/utf-8atom-bytes))][(list3atom-bytes)empty]))
procedure
( ebml-write elements[port])→void?
elements:(listofebml-element?)port:(output-port?)=(current-output-port)