2Parser language🔗 i

The language for amb-parser is similar to one in brag, but with several differences:

Semicolon is required at the end of a production.
Terminals are written in lowercase, non-terminals are in uppercase.

Note: the parser does not support left-recursion. (Yet Another Parser that Doesn’t Support Left Recursion, yapdslr).

Example:

#lang amb-parser
S:NPVP;
NP:determiner?adjective*noun;
VP:verbNP;

The parser supports ONLY:

Rule combinations:

S : NP VP;
Optional rule:

NP : determiner? noun;
Kleene star rule:

NP : adjective* noun;
Alteration on the top level of an expansion:

A : b | c;

Note: the parser does not support grouping.

Note: you are allowed to combine question and star operators only in the order: rule*?.

2.1Formal parser language definition🔗 i

#lang brag

parser-syntax:rule+

rule:NON-TERMINALCOLONalterationSEMICOLON

alteration:expansion(ALTERATIONexpansion)*
expansion:(non-terminal|terminal)*

non-terminal:NON-TERMINAL[STAR][QUESTION]
terminal:TERMINAL[STAR][QUESTION]

3Parser interface🔗 i

Firstly, import amb-parser in your Racket module.
Then create a file with

#lang amb-parser
line and with your grammar and require it in your Racket module.

You will be provided with the parse function. You can use this function to parse your text. But before, your text should be separated into tokens.

For more details, refer to the parse documentation.

4Usage example🔗 i

Let’s take the grammar in Parser interface:

#lang amb-parser
S:NPVP;
NP:determiner?adjective*noun;
VP:verbNP;

For sentence "Thebigcatcatchedasmallgreymouse.", which is transformed in the list of tokens:

(list
(token "the"' (determiner))
(token "big"' (adjective))
(token "cat"' (noun))
(token "catched"' (verb))
(token "a"' (determiner))
(token "small"' (adjective))
(token "grey"' (adjective))
(token "mouse"' (noun)))

The parser will generate such a result:

(list
(parser-result
' (S
(NP(determiner"the")
(adjective*(adjective"big")(adjective*))
(noun"cat"))
(VP
(verb"catched")
(NP
(determiner"a")
(adjective*(adjective"small")(adjective*(adjective"grey")(adjective*)))
(noun"mouse"))))
' ()))

Note how the star rule works.

5Parse tree structure🔗 i

In this section we will describe the structure of the parse tree without specifying that the parser returns multiple results and every parser-result contains the rest of tokens.

Every rule name : expansion; will be parsed as (name expansion).
Every expansion rule1 rule2 will be parsed as an association list, where keys and values are organized as list with two elements, keys are rule names and values are the result of parsing the rules.
Every terminal terminal will be parsed as (list ’terminal token-str) where token-str is the string of the matched token.

If some parsing is failed then the parse returns empty (meaning there is no derivation trees and no parser results).

6Types and functions🔗 i

6.1Tokens🔗 i

struct
(struct token (strpos)
#:transparent)
str:string?
pos:(listof symbol? )

Represents a token.

pos is a list of all possible parts of speech for this token.

When parser tries to parse a terminal rule it checks whether the terminal is a member of pos list.

Note: in fact, the implementation does not really utilizes the info that the str is a string? .

procedure
( token?x)→boolean?
x:any/c

Returns whether the x is a token .

procedure
( token-str token)→string?
token:token?

Returns the string of the token.

procedure
( token-pos token)→(listof symbol? )
token:token?

Returns all possible parts of speech for the token.

6.2Parser result🔗 i

struct
(struct parser-result (datarest)
#:transparent)
data:any/c
rest:(listof token?)

Represents a single result of parsing. It consists of parsed data and the rest of tokens (uparsed part).

procedure
( parser-result? x)→boolean?
x:any/c

Returns whether the x is a parser-result .

procedure
( parser-result-data res)→any/c
res:parser-result?

Returns the parsed data of the res.

procedure
( parser-result-rest res)→(listof token?)
res:parser-result?

Returns the uparsed tokens of the res.

6.3Parsing🔗 i

procedure
( parse non-terminaltokens)→(listof parser-result? )
non-terminal:symbol?
tokens:(listof token?)

Parses non-terminal from the tokens list.

Parser returns a list of all possible derivation trees. The results are organized into parser-result structures.

WARNING: There is a bad design desicion. The parse function requires a list of tokens, instead of a lambda function that generates tokens on demand.

top ← prev up next →