I have a very simple grammar for Antlr4:
grammar settings;
query
: COLUMN OPERATOR (SETTING|SCALAR)
;
COLUMN
: [a-z_]+
;
OPERATOR
: ('='|'>'|'<')
;
SETTING
: 'setting(' [a-z_]+ ')'
;
SCALAR
: [a-z_]+
;
I would like for input strings like total_sales>setting(min_total_sales) (they represent database column name, operator and value) define what is column name, operator, value. For that some python code was developed:
import re
from antlr4 import InputStream, CommonTokenStream
from settingsLexer import settingsLexer
from settingsParser import settingsParser
settings = {
'min_total_sales': 1000
}
conditions = 'total_sales>setting(min_total_sales)'
lexer = settingsLexer(InputStream(conditions))
stream = CommonTokenStream(lexer)
parser = settingsParser(stream)
tree = parser.query()
regex = re.compile('^setting\((?P<setting_name>[a-z_]+)\)$')
column = None
operator = None
value = None
for child in tree.getChildren():
text = child.getText()
# how to match what is child: column or operator or value???
# this for value defining
if match := regex.match(text):
setting_name = match.group('setting_name')
print(f'We should get value from setting named `{setting_name}`')
min_total_sales = settings['min_total_sales']
else:
print(f'We got a simple scalar value: {text}')
min_total_sales = int(text)
How to match what is child: column name or operator or value?
1 Answer 1
Why are you involving regex? When you have parsed the input, the tree structure will contain methods that correspond to the rules it matched. So, the object returned by parser.query(), which is the parser rule:
query
: COLUMN OPERATOR (SETTING|SCALAR)
;
will have 4 methods: COLUMN(), OPERATOR(), SETTING() and SCALAR()
Use them to extract the data you want:
tree = parser.query()
column = tree.COLUMN()
operator = tree.OPERATOR()
setting = tree.SETTING()
print(f"column={column}, operator={operator}, setting={setting}")
And I'd not glue the setting and min_total_sales into 1 big token, but let this be done by the parser instead. Otherwise input like total_sales>setting ( min_total_sales ) will not be matched because of the spaces.
grammar settings;
query
: COLUMN OPERATOR value EOF
;
value
: setting
| SCALAR
;
setting
: SETTING '(' SCALAR ')'
;
COLUMN
: [a-z_]+
;
OPERATOR
: ('='|'>'|'<')
;
SETTING
: 'setting'
;
SCALAR
: [a-z_]+
;
SPACES
: [ \t\r\n] -> skip
;