8
\$\begingroup\$

I'm working on writing a scripting language with ANTLR and C++. This is my first actual move from ANTLR grammars into the C++ API, so I'd like to know if this would be a good way to structure the grammar (later I will be adding a tree parser or tree rewriting rules though).

grammar dyst;
options
{
 language = C;
 output = AST;
 ASTLabelType=pANTLR3_BASE_TREE;
}
program : statement*;
statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;
namespaceDefineStm : 'namespace' ident '{' statement* '}';
usingNamespaceStm : 'using' 'namespace' ident (',' ident)* ';';
stopUsingNamespaceStm : 'stop' 'using' 'namespace' ident (',' ident)* ';';
directive : '@' directiveId argList? ';';
directiveId : ID (':' ID)*;
importOnceStm : 'import_once' expression ';';
importStm : 'import' expression ';';
exitStm : 'exit' expression? ';';
classDefStm : 'class' ident ('extends' ident (',' ident)*)? '{' (classSection|funcDefineStm|defineStm|eventDefStm)* '}';
classSection : ('public'|'private'|'protected') ':';
linkStm : 'link' ident 'to' ident (',' ident)* ';';
eventCallStm : 'call' ident (',' argList)? ';';
eventDefStm : 'event' ident '(' paramList? ')' ';';
returnStm : 'return' expression ';';
breakStm : 'break' int ';';
ifStm : 'if' '(' expression ')' '{' statement* '}';
whileStm : 'while' '(' expression ')' '{' statement* '}';
defineStm : 'global'? 'def' ident ('=' expression)? ';';
assignStm : ident '=' expression ';';
funcDefineStm : 'function' ident '(' paramList? ')' ('handles' ident (',' ident)*)? '{' statement* '}';
paramList : param (',' param)?;
param : ident ('=' expression)?;
functionStm : functionCall ';';
functionCall : ident '(' argList? ')';
argList : expression (',' expression)*;
//Expressions!
term : functionCall|value|'(' expression ')';
logic_not : ('!')* term;
bit_not : ('~')* logic_not;
urnary : '-'* bit_not;
mult : urnary (('*'|'/'|'%') urnary)*;
add : mult ('+' mult)*;
relation : add (('<='|'>='|'<'|'>') add)*;
equality : relation (('=='|'!=') relation)*;
bit_and : equality ('&' equality)*;
bit_xor : bit_and ('^' bit_and)*;
bit_or : bit_xor ('|' bit_xor)*;
logic_and : bit_or ('&&' bit_or)*;
logic_or : logic_and ('||' logic_and)*;
expression : logic_or;
value : ident|float|int|string|boolean|newObject|anonFunc|null_val;
anonFunc : 'function' '(' paramList? ')' '{' statement* '}';
newObject : 'new' ident ('(' argList ')')?;
ident : ID (('.'|'::') ID)*;
float : FLOAT;
int : INTEGER;
string : STRING_DOUBLE|STRING_SINGLE;
boolean : BOOL;
null_val : NULL_VAL;
FLOAT : INTEGER '.' INTEGER;
INTEGER : DIGIT+;
BOOL : 'true'|'false';
NULL_VAL : 'null'|'NULL';
STRING_DOUBLE : '"' .* '"';
STRING_SINGLE : '\'' .* '\'';
ID : (LETTER|'_') (LETTER|DIGIT|'_')*;
fragment DIGIT : '0'..'9';
fragment LETTER : 'a'..'z'|'A'..'Z';
NEWLINE : ('\n'|'\r'|'\t'|' ')+ {$channel = HIDDEN;};
COMMENT : '#' .* '\r'? '\n' {$channel = HIDDEN;};
MULTI_COMMENT : '/-' .* '-/' {$channel = HIDDEN;};

If you are wondering about exactly what it is I'm using this for, you can take a look here.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 27, 2011 at 3:24
\$\endgroup\$

1 Answer 1

7
\$\begingroup\$
  1. The grammar itself is pretty unreadable "as is". A rule like:

    statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;
    

    would be far more readable when declared like this:

    statement 
     : stopUsingNamespaceStm
     | usingNamespaceStm
     | namespaceDefineStm
     | functionStm
     | defineStm
     | assignStm
     | funcDefineStm
     | ifStm
     | whileStm
     | returnStm
     | breakStm
     | eventDefStm
     | eventCallStm
     | linkStm
     | classDefStm
     | exitStm
     | importStm
     | importOnceStm
     | directive
     ;
    
  2. You'll want to explicitly end the entry point of your parser, the rule program, with the end-of-file token, otherwise your parser might stop parsing prematurely. With EOF, you force the parser to read the entire tokens stream.

    program 
     : statement* EOF
     ;
    
  3. Make explicit tokens for keywords, don't mix them inside your parser rules.

    Instead of:

    importStm 
     : 'import' expression ';'
     ;
    

    it's better to do:

    importStm 
     : Import expression ';'
     ;
    Import
     : 'import'
     ;
    

    This will make your life easier at a later (tree walking) stage. Without explicit lexer tokens, it is unclear for you when debugging what tokens there actually are in your tree.

  4. Your lexer rules:

    STRING_DOUBLE : '"' .* '"';
    STRING_SINGLE : '\'' .* '\'';
    

    can never contain either double- or single quotes. So, it's impossible to have a string literal with a double- and single quote in it.

    Better to do something like this:

    STRING_DOUBLE 
     : '"' ('\\' ('\\' | '"') | ~('\\' | '"'))* '"'
     ;
    

    which will allow a double quoted string to contain double quotes as well.

That's all I saw at a first glance. I didn't look real close, so there might be more that can be improved.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
answered Mar 31, 2011 at 13:51
\$\endgroup\$
2
  • \$\begingroup\$ Thanks a lot, especially with the quote thing. I was having trouble with that. \$\endgroup\$ Commented Mar 31, 2011 at 18:04
  • \$\begingroup\$ @Sam, you're welcome. Note that the string literals now also accepts line breaks. If you don't want that, do something like this: STRING_DOUBLE : '"' ('\\' ('\\' | '"') | ~('\\' | '"' | '\r' | '\n'))* '"' ; \$\endgroup\$ Commented Mar 31, 2011 at 18:08

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.