Package: grep;
To reply to this bug, email your comments to 25048 AT debbugs.gnu.org.
the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-grep <at> gnu.org:bug#25048; Package grep.
(2016年11月28日 04:58:02 GMT) Full text and rfc822 format available.Jim Meyering <jim <at> meyering.net>:bug-grep <at> gnu.org.
(2016年11月28日 04:58:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Jim Meyering <jim <at> meyering.net> To: bug-grep <at> gnu.org Subject: --with-included-regex vs. e-acute piped into LC_ALL=fr_FR.iso88591 grep '[d-f]' Date: 2016年11月27日 20:57:23 -0800
When grep is configured --with-included-regex, the following command fails to print the expected match: printf '351円\n' |LC_ALL=fr_FR.iso88591 src/grep '[d-f]' You wouldn't notice on glibc-based systems, since the default there is to use the glibc-supplied regex code, which does make grep detect the match. However, on other systems (I noticed on OS X), configuration machinery detects that we have to resort to the included regex matcher, and there, the default build results in a grep binary that fails the new unibyte-bracket-expr test. Why? Because the included regcomp.c has two code paths: one for #if _LIBC (that is collating-sequence aware), and the other that ignores collation sequences. The former can be used only when building glibc itself, and is the path we require in order to handle this case. The latter code is what we get when compiling any place else. Since it's always been this way, I don't plan to attempt a work-around before the next release, and instead will probably arrange for that test to be skipped when grep is built with the included regex. Other ideas welcome, Jim
bug-grep <at> gnu.org:bug#25048; Package grep.
(2016年11月28日 16:54:01 GMT) Full text and rfc822 format available.Message #8 received at 25048 <at> debbugs.gnu.org (full text, mbox):
From: Eric Blake <eblake <at> redhat.com> To: Jim Meyering <jim <at> meyering.net>, 25048 <at> debbugs.gnu.org Subject: Re: bug#25048: --with-included-regex vs. e-acute piped into LC_ALL=fr_FR.iso88591 grep '[d-f]' Date: 2016年11月28日 10:53:04 -0600
[Message part 1 (text/plain, inline)]
On 11/27/2016 10:57 PM, Jim Meyering wrote: > When grep is configured --with-included-regex, the following command > fails to print the expected match: > > printf '351円\n' |LC_ALL=fr_FR.iso88591 src/grep '[d-f]' But the problem is that POSIX does NOT define what the "expected match" should be. The very fact that you're using a non-C locale but passing a range means that you have unspecified behavior per POSIX. Some regex engines treat 'e' and 'e-acute' as both being part of the range, others treat only 'e' as being part of the range. Expecting any particular behavior is a bug, unless you know for sure that you are using GNU's "rational range behavior" which explicitly treats ranges in ALL locales the same as if they were in the C locale (that is, e-acute is never part of the [d-f] range under rational range behavior). > > Since it's always been this way, I don't plan to attempt a work-around > before the next release, and instead will probably arrange for that > test to be skipped when grep is built with the included regex. > > Other ideas welcome, We SHOULD be adjusting more and more GNU tools to honor rational range behavior, at least as an option, even if that means that e-acute can never be matched to [d-f]. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
bug-grep <at> gnu.org:bug#25048; Package grep.
(2016年11月28日 17:14:01 GMT) Full text and rfc822 format available.Message #11 received at 25048 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Eric Blake <eblake <at> redhat.com>, Jim Meyering <jim <at> meyering.net>, 25048 <at> debbugs.gnu.org Subject: Re: bug#25048: --with-included-regex vs. e-acute piped into LC_ALL=fr_FR.iso88591 grep '[d-f]' Date: 2016年11月28日 09:13:02 -0800
On 11/28/2016 08:53 AM, Eric Blake wrote: > We SHOULD be adjusting more and more GNU tools to honor rational range > behavior Yes, sorry, I forgot about that possibility when writing that test. I reverted the change to grep that added the test; this should fix the problem.
bug-grep <at> gnu.org:bug#25048; Package grep.
(2016年11月28日 18:49:02 GMT) Full text and rfc822 format available.Message #14 received at 25048 <at> debbugs.gnu.org (full text, mbox):
From: arnold <at> skeeve.com To: jim <at> meyering.net, eblake <at> redhat.com, 25048 <at> debbugs.gnu.org Subject: Re: bug#25048: --with-included-regex vs. e-acute piped into LC_ALL=fr_FR.iso88591 grep '[d-f]' Date: 2016年11月28日 11:48:11 -0700
> We SHOULD be adjusting more and more GNU tools to honor rational range > behavior, Hear, hear! (Or "+1" in 21st Century English.) The official term, coined by Karl Berry and as documented in the gawk manual, is "Rational Range Interpretation". :-) :-) > at least as an option, even if that means that e-acute can > never be matched to [d-f]. Now, if we could get GLIBC to move to that, we'd have something. I've tried to submit patches in the past that weren't accepted, but maybe it's worth trying again. At least gawk and gnulib-based programs generally do so. Arnold
Paul Eggert <eggert <at> cs.ucla.edu>
to control <at> debbugs.gnu.org.
(2016年12月18日 21:40:02 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.