lua-users home
lua-l archive

Re: Emulating advanced regex features using Lua patterns and pure Lua code

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Jun 20, 2011 at 11:12 AM, Lorenzo Donati
<lorenzodonatibz@interfree.it> wrote:
> Hi all!
>
> A recent thread about Lua patterns not being regexes induced me to start
> this thread.
>
> I wonder if there are ways to EASILY emulate some advanced regex features,
> usually found in PCRE-like packages, without resorting to LPEG or C bindings
> to external libs.
>
> For example, some uses of regex alternation can be emulated using multiple
> calls to string.match and using logical operators, that is:
>
> -- sort of pseudocode
> local pcre = require 'pcre'
>
> if pcre.match( haystack, "^foo|bar$" ) then ...
>
> may be rewritten in pure Lua as:
>
> if haystack:match "^foo$" or haystack:match "^bar$" then ...
>
>
> Are there common Lua idioms to emulate advanced regex features as:
>
> - Alternation (I don't know if the example above is universally applicable)
> - Positive/Negative look-ahead/-behind
> - Quantification/capture of complex patterns (as in pcre's "((?:foo)+)").
>
>
> If there were a sort of one-to-one (almost boilerplate) translation between
> those PCRE features and Lua idioms, it would very useful.
>
> In particular, for my typical use case, I could reuse my knowledge of PCRE
> without resorting to heavy external bindings or LPEG (I'm lazy, I've got
> little time lately and it is still on my TODO list: "learn LPEG" :-)
>
>
> Thanks in advance for any suggestion/pointer or contribute to the discussion
> (if the list find it worthwhile :-)
>
> Cheers.
> -- Lorenzo
>
>
I think you'd basically have to get used to using position captures
and custom loops. Unlike in common regex flavours, ^ will match on a
given position, and this can be useful here. For example, instead of:
 for m in text:gmatch('foo|bar') do
 -- [LOOP BODY]
 end
...you could do (untested):
 local pos = 1
 while pos <= #text do
 -- find the next f or b
 pos = string.match(text, '()[fb]', pos)
 if not pos then
 break
 end
 -- try to match foo or bar at this position
 local m, next_pos = string.match(str, '^(foo)()', pos)
 if not m then
 m, next_pos = string.match(str, '^(bar)()', pos)
 end
 if m then
 pos = next_pos
 -- [LOOP BODY]
 else
 pos = pos + 1
 end
 end
(In other words... if you value brevity, bite the LPEG bullet. :)
-Duncan

AltStyle によって変換されたページ (->オリジナル) /