lua-users home
lua-l archive

LPeg: parsing text with wikilinks

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Dear all,
I am trying to write a parser that would process some wikitext--text with wikilinks [[page|(optional alias)]] (but not escaped [[:page]]) inside. I want it to return the passed string itself (with some alterations), a list of referred pages and the symbol that is likely to be the list separator in the string passed.
I wrote the following grammar:
 wikitext <- {| { {| ( prefix? wikilink )+ |} tail } |}
 wikilink <- unescapedopen page alias? close
 page <- { ( !close !pipe . )+ }
 alias <- pipe ( !close . )*
 tail <- .*
 prefix <- ( separator / ( !unescapedopen . ) )+
 open <- "[["
 unescapedopen <- open !escape
 close <- "]]"
 pipe <- "|"
 escape <- ":"
 separator <- {:separator: [,;*#] :} space*
 space <- %s
After applying it (re.match) to the example line "Perhaps, [[Peter|Simon]], or [[Paul]], so they say", I got:
table {
 1 = Perhaps, [[Peter|Simon]], or [[Paul]], so they say
 2 = table {
 1 = Peter
 2 = Paul
 separator = ,
 }
}
This is close to what I want.
However, there are some issues:
1) can I make outer table's indices strings: ['full'] not [1], ['items'] not [2]? I experimented with named group captures but unsuccessfully.
2) can the number of nested captures be reduced?
3) most importantly: I want a string constant (e.g. "Name::") to be inserted after any found <unescapedopen>; and the first capture that returns the whole line should contain this constant: "...[[Name::Paul]]..." not "...[[Paul]]...". This can have something to do with substitution captures; I tried them but couldn't do it. Can it be done at all?
Alexander Mashin

AltStyle によって変換されたページ (->オリジナル) /