9.0
top
← prev up next →

binfmt: binary format parser generatorπŸ”— i

Bogdan Popa <bogdan@defn.io>

This package provides a #lang for building binary format parsers with support for limited context-sensitivity.

1ExampleπŸ”— i

Here is a parser definition for the ID3v1 format:

id3=magictitleartistalbumyearcommentgenre;
magic='T''A''G';
title=u8{30};
artist=u8{30};
album=u8{30};
year=u8{4};
comment=u8{30};
genre=u8;

Assuming this is saved in a file called "id3v1.b", you can import it from Racket and apply any of the definitions to an input port in order to parse its contents:

> (require "id3v1.b")

You can parse the magic header by itself:

> (magic(open-input-bytes #"TAG"))

'((char_1 . #\T) (char_2 . #\A) (char_3 . #\G))

Or a full tag:

> (define data
#"TAGCreative Commons SongImprobulusN"
#"/A2005Take on O Mio Babbino Caro!g"))
> (define tree
(id3(open-input-bytes data)))

And inspect the resulting parse tree:

> (map car tree)

'(magic_1 title_1 artist_1 album_1 year_1 comment_1 genre_1)

> (define ref(compose1 cdr assq ))
> (take(ref'title_1tree)8)

'(67 114 101 97 116 105 118 101)

> (apply bytes (ref'title_1tree))

#"Creative Commons Song "

Finally, parsing invalid data results in a syntax error:

> (id3(open-input-bytes #"TAG..."))

parse failed

expected 'u8' but found EOF

in: string

position: 7

Every definition automatically creates an un-parser. Un-parsers are functions that take a parse tree as input and serialize the data to an output port. They are named by prepending un- to the name of a definition.

> (define bs
(call-with-output-bytes
(lambda (out)
(un-id3treeout))))
> (for ([n(in-range 0(bytes-length bs)64)])
(println (subbytes bsn(+ n64))))

#"TAGCreative Commons Song ImprobulusN"

#"/A 2005Take on O Mio Babbino Caro! g"

2Grammar and OperationπŸ”— i

The grammar for binfmt is as follows:

def

::=

alt{|alt}*;

alt

::=

expr+

expr

::=

term|star|plus|repeat

star

::=

term*

plus

::=

term+

repeat

::=

term{id|natural}

term

::=

byte

|

char

|

id

byte

::=

an integer between 0x00 and 0xFF

char

::=

'ascii character'

id

::=

any identifier

natural

::=

any natural number

Within an alt, each expr is assigned a unique name based on its id: the first time an id appers in an alt, _1 is appended to its name, the second time _2, and so on.

Alternatives containing two or more exprs parse to an association list mapping expr names (as defined above) to parse results. Alternatives containing a single expr collapse to the result of the expr.

The repeat syntax can either repeat a parser an exact number of times or it can repeat it based on the result of a previous parser within the same alt. For example, the following parser parses a i8 to determine the length of a string, then parses that number of u8s following it.

string=strlenu8{strlen_1};
strlen=i8;

Negative length values are allowed, in which case they’re treated the same as 0. The parser above would parse #"377円" to an empty string.

The following parsers are built-in:

  • TODO

  • u8, u16, u32, u64, u16le, u32le, u64le, u16be, u32be, u64be

  • i8, i16, i32, i64, i16le, i32le, i64le, i16be, i32be, i64be

  • f32, f64, f32le, f64le, f32be, f64be

  • uvarint32, uvarint64

  • varint32, varint64

  • nul, eof

Parsers for alts may backtrack, but backtracking is only supported on file and string input ports. All other types of ports (eg. pipes and custom ports that don’t support setting a file position) cause backtracking to fail with a parsing error.

On parse and unparse failure, an exn:fail:binfmt? error is raised.

3ReferenceπŸ”— i

procedure

( exn:fail:binfmt? v)boolean?

v:any/c
Returns #t when v is a binfmt error.

Returns the id of the parser or unparser that failed.

top
← prev up next →

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /