I don't think this is a bug. I think it is expected
behaviour. (At least, it is the behaviour I would expect.)
> 2. Patterns which can match with empty string
may matches twice
> at same position.
> For example,
> > = string.gsub("abc", ".*", "x")
> xx 2
> > = string.gsub("12ab", "%a*$", "x")
> 12xx 2
> These results should be "x 1" and "12x 1".
Why? The two matches are not in the same position.
In the first case, for example, the first time through "abc"
is matched at position 1; the second time through "" is
matched at position 4. If you didn't want that behaviour, you should use
".+".
My guess is that you were trying to do something like
this: string.gsub(str, "([^\r\n]*)\r?\n?",
"%1\n") in an attempt to normalise line-endings, and found
that the strings which already were terminated with a line-end now
have two. One way to actually accomplish this is: string.gsub(str, "([^\r\n]*)(\r?\n?)", function(line,
ending)
if ending == "" then
return line
else
return line .. "\n"
end)
or "simply": string.gsub(str, "([^\r\n]*)(\r?\n?)", function(line,
ending)
return line .. (ending == "" and "" or "\n") end) Another option is: string.gsub(str, "[\r\n]+", function(endings) local _, e = string.find(endings,
"\r?\n?") return string.rep("\n",
string.len(endings) / e) end)
I'm sure there are others.
The second case seems a little wierder, if you think
that $ means "match the terminator". It doesn't, though.
It is, in perl-speak, a zero-length assertion, in perl-speak: it requires
the match to end at the last character of the string. In this case,
you could get what I think you want with:
string.gsub(str, "%a*$", "x",
1)
In general, gsubs with zero-length matches need to
be done cautiously; and should be avoided if possible.