Package: grep;
Reported by: Sundeep Agarwal <learnbyexample.net <at> gmail.com>
Date: 2017年5月10日 15:31:02 UTC
Severity: normal
Merged with 36148
To reply to this bug, email your comments to 26864 AT debbugs.gnu.org.
the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-grep <at> gnu.org:bug#26864; Package grep.
(2017年5月10日 15:31:02 GMT) Full text and rfc822 format available.Sundeep Agarwal <learnbyexample.net <at> gmail.com>:bug-grep <at> gnu.org.
(2017年5月10日 15:31:03 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Sundeep Agarwal <learnbyexample.net <at> gmail.com> To: bug-grep <at> gnu.org Subject: Clarification on obscure regular expressions mentioned in known bugs Date: 2017年5月10日 14:18:37 +0530
[Message part 1 (text/plain, inline)]
Hello,
From the man page, version 'grep (GNU grep) 2.25'
--------------------------------
Known Bugs
Large repetition counts in the {n,m} construct may cause grep to use
lots of memory. In addition, certain other obscure regular expressions
require exponential time and space, and may cause grep to run out of memory.
Back-references are very slow, and may require exponential time.
--------------------------------
I was trying a regular expression to find words from dictionary that have
two different instances of repeated letters, for example the words:
misspellings, chilliness, woodcutter etc
$ # gives no output
$ grep -m5 -xiE '([a-z]*([a-z])2円[a-z]*){2}' /usr/share/dict/words
$ # works as expected with PCRE
$ grep -m5 -xiP '([a-z]*([a-z])2円[a-z]*){2}' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed
I asked regarding this on
https://stackoverflow.com/questions/43572924/ere-adding-quantifier-to-group-with-inner-group-and-back-reference
and other forums. It helped to identify some more cases like
$ echo 'aazbbycc' | grep -E '(([a-z])2円[a-z]*){2}'
aazbbycc
$ # no output
$ echo 'aazbbycc' | grep -E '(([a-z])2円[a-z]*){3}'
and
$ echo 'aazbbycc' | grep -E '(([a-z])2円[a-z]{0,3}){3}'
aazbbycc
$ # no output
$ echo 'aazbbycc' | grep -E '(([a-z])2円[a-z]{0,4}){3}'
$ # seems dependent on character class clashing with back reference
characters and quantifier count
$ # a, b and c are the characters matching back reference
$ echo 'aazbbycc' | grep -E '(([a-z])2円[abcyz]{0,4}){2}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])2円[abcyz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])2円[abyz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])2円[byz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])2円[acyz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])2円[ayz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])2円[cyz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])2円[yz]{0,4}){3}'
aazbbycc
The same behavior is seen with 'sed (GNU sed) 4.2.2' as well. For ex:
$ echo 'aazbbycc' | sed -nE '/(([a-z])2円[a-z]{0,3}){3}/p'
aazbbycc
$ # no output
$ echo 'aazbbycc' | sed -nE '/(([a-z])2円[a-z]{0,4}){3}/p'
So, my question is whether these regular expression examples come under
'obscure regular expressions' mentioned in the man page. If so, I feel
there should be an error message displayed instead of no output
Regards,
Sundeep
[Message part 2 (text/html, inline)]
bug-grep <at> gnu.org:bug#26864; Package grep.
(2019年12月30日 09:02:01 GMT) Full text and rfc822 format available.Message #8 received at 26864 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Sundeep Agarwal <learnbyexample.net <at> gmail.com> Cc: 26864 <at> debbugs.gnu.org Subject: Re: bug#26864: Clarification on obscure regular expressions mentioned in known bugs Date: 2019年12月30日 01:01:21 -0800
[Message part 1 (text/plain, inline)]
On 5/10/17 1:48 AM, Sundeep Agarwal wrote: > my question is whether these regular expression examples come under > 'obscure regular expressions' mentioned in the man page. No, they're bugs in grep's regular expression implementation, which is taken from glibc. I filed a bug report against glibc here: https://sourceware.org/bugzilla/show_bug.cgi?id=25322 and installed the attached patches to GNU grep to try to document this mess better. The first patch is the important one; the other two merely standardize spelling and fix a typo in the first patch. Thanks for reporting the problem.
[0001-doc-mention-back-reference-bugs.patch (text/x-patch, attachment)]
[0002-doc-spell-back-reference-more-consistently.patch (text/x-patch, attachment)]
[0003-doc-fix-bug-typo.patch (text/x-patch, attachment)]
bug-grep <at> gnu.org:bug#26864; Package grep.
(2019年12月30日 18:52:02 GMT) Full text and rfc822 format available.Message #11 received at 26864 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Sundeep Agarwal <learnbyexample.net <at> gmail.com> Cc: 26864 <at> debbugs.gnu.org Subject: Re: bug#26864: Clarification on obscure regular expressions mentioned in known bugs Date: 2019年12月30日 10:51:16 -0800
[Message part 1 (text/plain, inline)]
On further thought, we shouldn't be encouraging palindromish REs in the manual, so that naive users aren't drawn into this mess. So I installed the attached further patch to the documentation. Of course it would be better to fix the back-reference bugs but this is low priority and I doubt whether anybody has the time.
[0001-doc-don-t-encourage-back-references.patch (text/x-patch, attachment)]
Paul Eggert <eggert <at> cs.ucla.edu>
to control <at> debbugs.gnu.org.
(2020年1月02日 09:35:01 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.