lua-users home
lua-l archive

Re: Time constraint in Lua pattern functions

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 14/03/16 05:26 PM, Egor Skriptunoff wrote:
On Mon, Mar 14, 2016 at 10:02 PM, Roberto Ierusalimschy <roberto@inf.puc-rio.br <mailto:roberto@inf.puc-rio.br>> wrote:
 > After I've upgraded Lua to 5.3.2, one of my scripts terminates with
 > "pattern too complex" error message.
 >
 > Probably, this is because of gmatch using non-optimal pattern
 > (having quadratic time complexity), which may require up to 2 sec
 > to complete.
 >
 > Of course, it is possible to rewrite that script to make its time
 > complexity linear (at the cost of extra LOC and more complex logic
 > of code).
 >
 > But the are two reasons for NOT rewriting it:
 > 1) I don't want to spent my time on rewriting my old script
 > because I'm quite happy with its current performance (2-3 seconds
 > is OK for me).
 > 2) I don't want to bring extra complexity to the script.
 > As for now, it is one-liner regexp, and I'd like to stay it
 > as simple as it is.
 Can you show your regexp/subject?
I have a code similar to this one:
local pattern =
'id="post(%d+)".-class="Post Header".-<h2>(.-)</h2>.-(/forum/post%1%.htm#details)'
for id, title, link in main_forum_page:gmatch(pattern) do
 analyze_post(id, title, link)
end
Once in a while a post does not have a "View details" link (that is, third capture does not match). In such rare cases non-linear behavior is observed due to chain of four ".-" in the pattern. I prefer waiting in runtime for 2 seconds to losing simplicity of the code.
So this is a variant to the parsing HTML with regex problem...
--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.

AltStyle によって変換されたページ (->オリジナル) /