Package: grep;
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Mon, 9 Jan 2023 23:01:01 UTC
Severity: normal
To reply to this bug, email your comments to 60697 AT debbugs.gnu.org.
the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-grep <at> gnu.org:bug#60697; Package grep.
(2023年1月09日 23:01:02 GMT) Full text and rfc822 format available.Paul Eggert <eggert <at> cs.ucla.edu>:bug-grep <at> gnu.org.
(2023年1月09日 23:01:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: bug-grep <at> gnu.org Subject: GNU grep mishandles \b near encoding errors Date: Mon, 9 Jan 2023 15:00:15 -0800
Here's a shell session illustrating the problem on Fedora 37, which has GNU grep 3.7. The same bug is still in bleeding-edge GNU grep. $ export LC_ALL=en_US.utf8 $ printf '300円\n' | grep '\b' grep: (standard input): binary file matches $ printf '300円\n' | grep -P '\b' $ Plain grep finds a word boundary in the input even though the input contains no words (just an encoding error). 'grep -P' does the right thing. The underlying issue is in the glibc regex code so the fix should be in glibc / Gnulib, but I thought I'd report it here before I forgot it.
bug-grep <at> gnu.org:bug#60697; Package grep.
(2023年1月12日 06:05:01 GMT) Full text and rfc822 format available.Message #8 received at 60697 <at> debbugs.gnu.org (full text, mbox):
From: Jim Meyering <jim <at> meyering.net> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 60697 <at> debbugs.gnu.org Subject: Re: bug#60697: GNU grep mishandles \b near encoding errors Date: 2023年1月11日 22:03:52 -0800
On Mon, Jan 9, 2023 at 10:16 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote: > Here's a shell session illustrating the problem on Fedora 37, which has > GNU grep 3.7. The same bug is still in bleeding-edge GNU grep. > > $ export LC_ALL=en_US.utf8 > $ printf '300円\n' | grep '\b' > grep: (standard input): binary file matches > $ printf '300円\n' | grep -P '\b' > $ > > Plain grep finds a word boundary in the input even though the input > contains no words (just an encoding error). 'grep -P' does the right thing. > > The underlying issue is in the glibc regex code so the fix should be in > glibc / Gnulib, but I thought I'd report it here before I forgot it. Thanks! While this would definitely be nice to fix before the release (in the next week or so), it's enough of a corner case that I wouldn't feel bad releasing without a fix. For the record, this problem first arose in grep-2.19.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.