Extension of pyparsing. You can easily build your own languages. ✌️
- PEG
- Regular Expressions
- Parser
- Formal Grammar
- Operating Semantics
1. mixedExpression
2. build languages (see example)
pyparsing
pip install pyparsing_ext
- core: basic token classes
- actions: classes for parsing actions
- expressions: complicated expressions
- utils: some useful tools
Classes::
Tokens:
Wordx: powerful Word
CharsNot: powerful CharsNotIn
PrecededBy: as FollowedBy (moved to pyparsing)
MeanWhile:
LinenStart:
Actions:
BaseAction: Base Class of Actions
BifixAction: action for bifix operators such as <x,y>
...
How to define an 'Action' class, that is a wrapper of ParseResults
# inherit BaseAction directly class VarOpAction(BaseAction): # for operators with variables pass # inherit a subclass of BaseAction class IndexOpAction(VarOpAction): # x[start:stop] names = ('slice', 'index') # register the names of tokens def __init__(self, instring='', loc=0, tokens=[]): # add names or handle with tokens advancedly super(IndexOpAction, self).__init__(instring, loc, tokens) if 'slice' in self: slc = tokens.slice self.start = slc.start if 'start' in slc else None self.stop = slc.stop if 'stop' in slc else None self.step = slc.step if 'step' in slc else None else: self.index = tokens.index def eval(self, calculator): # define eval method, define the semantics of the token if 'slice' in self: return slice(self.start.eval(calculator), self.stop.eval(calculator), self.step.eval(calculator)) else: return self.index.eval(calculator)
Functions::
keyRange(s) ordRange(s) chrRange(s) CJK # for matching Chinese Japanese Korean enumeratedItems delimitedMatrix # delimitedList with two seps
w = Wordx(lambda x: x in {'a', 'b', 'c', 'd'}) # == Word('abcd') M = delimitedMatrix(w, ch1=' ', ch2=pp.Regex('\n+').leaveWhitespace()) p = M.parseString('a b\n c d') print(p.asList()) s = ''' [1]hello, world [2]hello, kitty ''' print(enumeratedItems().parseString(s)) cjk = ordRange(0x4E00, 0x9FD5) cjk.parseString('我爱你, I love you') # => ['我爱你'] cjk = ordRanges((0x4E00, 0x9FD5, 0, 256)) cjk.parseString('我爱你 I love you') # => ['我爱你 I love you'] import pyparsing as pp integer = pp.pyparsing_common.signed_integer varname = pp.pyparsing_common.identifier arithOplist = [('-', 1, pp.opAssoc.RIGHT), (pp.oneOf('* /'), 2, pp.opAssoc.LEFT), (pp.oneOf('+ -'), 2, pp.opAssoc.LEFT)] def func(EXP): return pp.Group('<' + EXP + pp.Suppress(',') + EXP +'>')| pp.Group('||' + EXP + '||') | pp.Group('|' + EXP + '|') | pp.Group(IDEN + '(' + pp.delimitedList(EXP) + ')') baseExpr = integer | varname EXP = mixedExpression(baseExpr, func=func, opList=arithOplist) a = EXP.parseString('5*g(|-3|)+<4,5> + f(6)') print(a) # [[[5, '*', ['g', '(', ['|', ['-', 3], '|'], ')']], '+', ['<', 4, 5, '>'], '+', ['f', '(', 6, ')']]]
run example1.py for a simple example
output:
Example 1: |-1| -> ('|', '|')(-(1)) Example 2: parse source code: x=|-1|; # absolute value y=x*2+1; if x == 1 {z=[3.3_]; # the floor value } print "z =", z; result: z = 3 see the dictionary of variables: {'x': Decimal('1'), 'y': Decimal('3'), 'z': 3}
In example2.py, we create a programming language, "Small Python".
run example2.py for a complicated example, to parse a text file test.spy
example2.smallpy.cmdline() # in mode of command line
The following method in base class of actions may lead error! just delete it in the latest version
# def __getitem__(self, key): # if isinstance(key, int): # return self.tokens[key] # else: # return getattr(self.tokens, key)