Debian Bug report logs - #884075
glibc: Wrong results with regex backreferences

Package: glibc; Maintainer for glibc is GNU Libc Maintainers <debian-glibc@lists.debian.org>;

Reported by: Mathias Pietsch <m.pietsch@uke.uni-hamburg.de>

Date: Wed, 6 Dec 2017 23:06:01 UTC

Severity: normal

Tags: upstream

Blocking fix for 883733: grep returns 0 even if there is no match

Forwarded to https://sourceware.org/bugzilla/show_bug.cgi?id=29560

Reply or subscribe to this bug.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#883733; Package grep. (2017年12月06日 23:06:04 GMT) (full text, mbox, link).


Acknowledgement sent to Mathias Pietsch <m.pietsch@uke.uni-hamburg.de>:
New Bug report received and forwarded. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2017年12月06日 23:06:04 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Mathias Pietsch <m.pietsch@uke.uni-hamburg.de>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: grep returns 0 even if there is no match
Date: Wed, 6 Dec 2017 23:51:52 +0100
[Message part 1 (text/plain, inline)]
Package: grep
Version: 2.27-2
Severity: normal
Tags: upstream
when trying to test this famous regexp for matching non-prime numbers
(^1?$|^(11+?)1円+$) which works fine with 'grep -P', i wondered if it
also would work without the non-greedy quantifier so egrep or even
plain grep could use it, and found the following problem e.g., with the
prime number 13:
$ echo "1111111111111" | grep -E '^(11+)1円+$|^1?$' || echo prime
1111111111111
the expected output would have been 'prime' because '1111111111111'
doesn't match '^1?$' and is also no concatanation of two or more
'11', two or more '111', ... opposite to the orignal perl-style
non-greedy version, here the substrings should be tested for a match
beginning with the longest (13 x '1') down to the shortest ('11').
next i removed the empty line term from the regexp (i.e., the '?' from
the '^1?$' term):
$ echo "1111111111111" | grep -E '^(11+)1円+$|^1$' || echo prime
prime
now the result is correct. but since the input in not an empty line,
using '^(11+)1円+$|^1?$' or '^(11+)1円+$|^1$' should not make any
difference.
(making the empty line term a separate term '^(11+)1円+$|^1$|^$' doesn't
change anything. the same is true with using plain grep and
'^\(11\+\)1円\+$\|^1\?$' or '^\(11\+\)1円\+$\|^1$\|^$'.)
this bug also appears in the original upstream version 3.1
(http://ftp.gnu.org/gnu/grep/grep-3.1.tar.xz)
-- System Information:
Debian Release: 9.3
 APT prefers proposed-updates
 APT policy: (500, 'proposed-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.9.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=C (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: sysvinit (via /sbin/init)
Versions of packages grep depends on:
ii dpkg 1.18.24
ii install-info 6.3.0.dfsg.1-1+b2
ii libc6 2.24-11+deb9u2
ii libpcre3 2:8.39-3
grep recommends no packages.
Versions of packages grep suggests:
ii libpcre3 2:8.39-3
-- no debconf information
[Message part 2 (text/plain, inline)]
--
_____________________________________________________________________
Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_____________________________________________________________________
SAVE PAPER - THINK BEFORE PRINTING
[Message part 3 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#883733; Package grep. (2017年12月08日 11:15:06 GMT) (full text, mbox, link).


Acknowledgement sent to "Santiago R.R." <santiagorr@riseup.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2017年12月08日 11:15:06 GMT) (full text, mbox, link).


Message #10 received at 883733@bugs.debian.org (full text, mbox, reply):

From: "Santiago R.R." <santiagorr@riseup.net>
To: bug-grep@gnu.org
Cc: 883733@bugs.debian.org
Subject: Debian Bug#883733: grep returns 0 even if there is no match
Date: Fri, 8 Dec 2017 12:11:10 +0100
Dear grep developers,
I would like to forward the report below, filed by Mathias Pietsch to
Debian. I don't want to introduce other noise than this:
$ echo 1111111111111 | grep -E '^1?$' ; echo $?
1
$ echo 1111111111111 | grep -E '^(11+)1円+$' ; echo $?
1
$ echo 1111111111111 | grep -E '^(11+)1円+$|^1?$' ; echo $?
1111111111111
0
Shouldn't the last grep command exit 1 too?
Cheers,
 -- Santiago
----- Forwarded message from Mathias Pietsch <m.pietsch@uke.uni-hamburg.de> -----
Date: Wed, 6 Dec 2017 23:51:52 +0100
From: Mathias Pietsch <m.pietsch@uke.uni-hamburg.de>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#883733: grep returns 0 even if there is no match
X-Mailer: reportbug 7.1.7
Package: grep
Version: 2.27-2
Severity: normal
Tags: upstream
when trying to test this famous regexp for matching non-prime numbers
(^1?$|^(11+?)1円+$) which works fine with 'grep -P', i wondered if it
also would work without the non-greedy quantifier so egrep or even
plain grep could use it, and found the following problem e.g., with the
prime number 13:
$ echo "1111111111111" | grep -E '^(11+)1円+$|^1?$' || echo prime
1111111111111
the expected output would have been 'prime' because '1111111111111'
doesn't match '^1?$' and is also no concatanation of two or more
'11', two or more '111', ... opposite to the orignal perl-style
non-greedy version, here the substrings should be tested for a match
beginning with the longest (13 x '1') down to the shortest ('11').
next i removed the empty line term from the regexp (i.e., the '?' from
the '^1?$' term):
$ echo "1111111111111" | grep -E '^(11+)1円+$|^1$' || echo prime
prime
now the result is correct. but since the input in not an empty line,
using '^(11+)1円+$|^1?$' or '^(11+)1円+$|^1$' should not make any
difference.
(making the empty line term a separate term '^(11+)1円+$|^1$|^$' doesn't
change anything. the same is true with using plain grep and
'^\(11\+\)1円\+$\|^1\?$' or '^\(11\+\)1円\+$\|^1$\|^$'.)
this bug also appears in the original upstream version 3.1
(http://ftp.gnu.org/gnu/grep/grep-3.1.tar.xz)
-- System Information:
Debian Release: 9.3
 APT prefers proposed-updates
 APT policy: (500, 'proposed-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.9.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=C (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: sysvinit (via /sbin/init)
Versions of packages grep depends on:
ii dpkg 1.18.24
ii install-info 6.3.0.dfsg.1-1+b2
ii libc6 2.24-11+deb9u2
ii libpcre3 2:8.39-3
grep recommends no packages.
Versions of packages grep suggests:
ii libpcre3 2:8.39-3
-- no debconf information
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts;
Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe
Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SAVE PAPER - THINK BEFORE PRINTING
----- End forwarded message -----

Information stored :
Bug#883733; Package grep. (2017年12月08日 11:24:02 GMT) (full text, mbox, link).


Acknowledgement sent to "Santiago R.R." <santiagorr@riseup.net>:
Extra info received and filed, but not forwarded. (2017年12月08日 11:24:03 GMT) (full text, mbox, link).


Message #15 received at 883733-quiet@bugs.debian.org (full text, mbox, reply):

From: "Santiago R.R." <santiagorr@riseup.net>
To: 883733-quiet@bugs.debian.org
Subject: Re: Bug#883733: Info received (Debian Bug#883733: grep returns 0 even if there is no match)
Date: Fri, 8 Dec 2017 12:21:43 +0100
Control: forwarded -1 http://debbugs.gnu.org/cgi/bugreport.cgi?bug=29613

Set Bug forwarded-to-address to 'http://debbugs.gnu.org/cgi/bugreport.cgi?bug=29613'. Request was from "Santiago R.R." <santiagorr@riseup.net> to 883733-quiet@bugs.debian.org. (2017年12月08日 11:24:03 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#883733; Package grep. (2017年12月08日 18:42:03 GMT) (full text, mbox, link).


Acknowledgement sent to Jim Meyering <jim@meyering.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2017年12月08日 18:42:03 GMT) (full text, mbox, link).


Message #22 received at 883733@bugs.debian.org (full text, mbox, reply):

From: Jim Meyering <jim@meyering.net>
To: "Santiago R.R." <santiagorr@riseup.net>
Cc: 29613@debbugs.gnu.org, 883733@bugs.debian.org
Subject: Re: bug#29613: Debian Bug#883733: grep returns 0 even if there is no match
Date: Fri, 8 Dec 2017 10:38:45 -0800
On Fri, Dec 8, 2017 at 3:11 AM, Santiago R.R. <santiagorr@riseup.net> wrote:
> Dear grep developers,
>
> I would like to forward the report below, filed by Mathias Pietsch to
> Debian. I don't want to introduce other noise than this:
>
> $ echo 1111111111111 | grep -E '^1?$' ; echo $?
> 1
> $ echo 1111111111111 | grep -E '^(11+)1円+$' ; echo $?
> 1
> $ echo 1111111111111 | grep -E '^(11+)1円+$|^1?$' ; echo $?
> 1111111111111
> 0
>
> Shouldn't the last grep command exit 1 too?
>
> Cheers,
>
> -- Santiago
>
> ----- Forwarded message from Mathias Pietsch <m.pietsch@uke.uni-hamburg.de> -----
>
> Date: Wed, 6 Dec 2017 23:51:52 +0100
> From: Mathias Pietsch <m.pietsch@uke.uni-hamburg.de>
> To: Debian Bug Tracking System <submit@bugs.debian.org>
> Subject: Bug#883733: grep returns 0 even if there is no match
> X-Mailer: reportbug 7.1.7
>
> Package: grep
> Version: 2.27-2
> Severity: normal
> Tags: upstream
>
> when trying to test this famous regexp for matching non-prime numbers
> (^1?$|^(11+?)1円+$) which works fine with 'grep -P', i wondered if it
> also would work without the non-greedy quantifier so egrep or even
> plain grep could use it, and found the following problem e.g., with the
> prime number 13:
>
> $ echo "1111111111111" | grep -E '^(11+)1円+$|^1?$' || echo prime
> 1111111111111
>
> the expected output would have been 'prime' because '1111111111111'
> doesn't match '^1?$' and is also no concatanation of two or more
> '11', two or more '111', ... opposite to the orignal perl-style
> non-greedy version, here the substrings should be tested for a match
> beginning with the longest (13 x '1') down to the shortest ('11').
>
> next i removed the empty line term from the regexp (i.e., the '?' from
> the '^1?$' term):
>
> $ echo "1111111111111" | grep -E '^(11+)1円+$|^1$' || echo prime
> prime
>
> now the result is correct. but since the input in not an empty line,
> using '^(11+)1円+$|^1?$' or '^(11+)1円+$|^1$' should not make any
> difference.
>
> (making the empty line term a separate term '^(11+)1円+$|^1$|^$' doesn't
> change anything. the same is true with using plain grep and
> '^\(11\+\)1円\+$\|^1\?$' or '^\(11\+\)1円\+$\|^1$\|^$'.)
>
> this bug also appears in the original upstream version 3.1
> (http://ftp.gnu.org/gnu/grep/grep-3.1.tar.xz)
Yikes! Thanks for forwarding that.
That is indeed a bug. I think it must be due to a bug in glibc's
regexp code, since that's the matcher that grep uses when there is any
back-reference.

Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#883733; Package grep. (2017年12月08日 18:48:03 GMT) (full text, mbox, link).


Acknowledgement sent to Paul Eggert <eggert@cs.ucla.edu>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2017年12月08日 18:48:03 GMT) (full text, mbox, link).


Message #27 received at 883733@bugs.debian.org (full text, mbox, reply):

From: Paul Eggert <eggert@cs.ucla.edu>
To: "Santiago R.R." <santiagorr@riseup.net>, 29613@debbugs.gnu.org
Cc: 883733@bugs.debian.org
Subject: Re: bug#29613: Debian Bug#883733: grep returns 0 even if there is no match
Date: Fri, 8 Dec 2017 10:34:53 -0800
On 12/08/2017 03:11 AM, Santiago R.R. wrote:
> $ echo 1111111111111 | grep -E '^(11+)1円+$|^1?$' ; echo $?
> 1111111111111
> 0
>
> Shouldn't the last grep command exit 1 too?
Yes it should. This appears to be due to a longstanding bug in the glibc 
regular expression matcher. See:
https://sourceware.org/bugzilla/show_bug.cgi?id=11053

Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#883733; Package grep. (2017年12月11日 08:57:07 GMT) (full text, mbox, link).


Acknowledgement sent to "Santiago R.R." <santiagorr@riseup.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2017年12月11日 08:57:07 GMT) (full text, mbox, link).


Message #32 received at 883733@bugs.debian.org (full text, mbox, reply):

From: "Santiago R.R." <santiagorr@riseup.net>
To: 883733@bugs.debian.org
Cc: Paul Eggert <eggert@cs.ucla.edu>, Jim Meyering <jim@meyering.net>
Subject: Wrong results with regex backreferences
Date: 2017年12月11日 09:53:17 +0100
Control: clone -1 -2
Control: reassign -2 glibc
Control: retitle -2 glibc: Wrong results with regex backreferences
Control: forwarded -2 https://sourceware.org/bugzilla/show_bug.cgi?id=11053
Control: block -1 by -2
El 08/12/17 a las 10:34, Paul Eggert escribió:
> On 12/08/2017 03:11 AM, Santiago R.R. wrote:
> > $ echo 1111111111111 | grep -E '^(11+)1円+$|^1?$' ; echo $?
> > 1111111111111
> > 0
> > 
> > Shouldn't the last grep command exit 1 too?
> 
> Yes it should. This appears to be due to a longstanding bug in the glibc
> regular expression matcher. See:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=11053
> 
Hi,
Thanks for the info. I am reassigning this bug to glibc (and keeping a
copy of it for grep, in case possible future users will notice the
issue).
Cheers,
 -- Santiago

Bug 883733 cloned as bug 884075 Request was from "Santiago R.R." <santiagorr@riseup.net> to 883733-submit@bugs.debian.org. (2017年12月11日 08:57:07 GMT) (full text, mbox, link).


Bug reassigned from package 'grep' to 'glibc'. Request was from "Santiago R.R." <santiagorr@riseup.net> to 883733-submit@bugs.debian.org. (2017年12月11日 08:57:08 GMT) (full text, mbox, link).


No longer marked as found in versions grep/2.27-2. Request was from "Santiago R.R." <santiagorr@riseup.net> to 883733-submit@bugs.debian.org. (2017年12月11日 08:57:09 GMT) (full text, mbox, link).


Changed Bug title to 'glibc: Wrong results with regex backreferences' from 'grep returns 0 even if there is no match'. Request was from "Santiago R.R." <santiagorr@riseup.net> to 883733-submit@bugs.debian.org. (2017年12月11日 08:57:09 GMT) (full text, mbox, link).


Changed Bug forwarded-to-address to 'https://sourceware.org/bugzilla/show_bug.cgi?id=11053' from 'http://debbugs.gnu.org/cgi/bugreport.cgi?bug=29613'. Request was from "Santiago R.R." <santiagorr@riseup.net> to 883733-submit@bugs.debian.org. (2017年12月11日 08:57:10 GMT) (full text, mbox, link).


Added indication that bug 884075 blocks 883733 Request was from "Santiago R.R." <santiagorr@riseup.net> to 883733-submit@bugs.debian.org. (2017年12月11日 08:57:12 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#884075; Package glibc. (2022年9月06日 00:48:03 GMT) (full text, mbox, link).


Acknowledgement sent to Paul Eggert <eggert@cs.ucla.edu>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (2022年9月06日 00:48:03 GMT) (full text, mbox, link).


Message #49 received at 884075@bugs.debian.org (full text, mbox, reply):

From: Paul Eggert <eggert@cs.ucla.edu>
To: vincent-srcware at vinc17 dot net <sourceware-bugzilla@sourceware.org>
Cc: 884075@bugs.debian.org
Subject: Re: [Bug regex/11053] Wrong results with backreferences
Date: Mon, 5 Sep 2022 19:37:18 -0500
On 9/5/22 18:06, vincent-srcware at vinc17 dot net wrote:
>
> What is the status of this bug? The comment says that it is fixed, and I could
> check on an Ubuntu 22.04.1 LTS machine with libc6 2.35-0ubuntu3.1 that regbug.c
> and rebug2.c no longer fail, but the result is still incorrect with the grep
> example from Debian bug 884075:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075
>
> vinc17@gcc92:~$ echo 11111111111 | grep -E '^(11+)1円+$|^1?$' ; echo $?
> 11111111111
> 0
>
It looks like my comment 
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075#27> was 
incorrect, in that the two bugs are different bugs. glibc bug 11053 is 
fixed, but Debian bug 884075 is not fixed. Perhaps a better match for 
Debian bug 884075 is glibc bug 10844.
It's not an important bug. However, if you have time to fix it please 
feel free to send in a fix.

Changed Bug forwarded-to-address to 'https://sourceware.org/bugzilla/show_bug.cgi?id=10844' from 'https://sourceware.org/bugzilla/show_bug.cgi?id=11053'. Request was from Aurelien Jarno <aurel32@clementi.debian.org> to control@bugs.debian.org. (2022年9月06日 19:24:12 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#884075; Package glibc. (2022年9月08日 11:57:03 GMT) (full text, mbox, link).


Acknowledgement sent to Vincent Lefevre <vincent@vinc17.net>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (2022年9月08日 11:57:03 GMT) (full text, mbox, link).


Message #56 received at 884075@bugs.debian.org (full text, mbox, reply):

From: Vincent Lefevre <vincent@vinc17.net>
To: Paul Eggert <eggert@cs.ucla.edu>, 884075@bugs.debian.org
Subject: Re: Bug#884075: [Bug regex/11053] Wrong results with backreferences
Date: Thu, 8 Sep 2022 13:54:52 +0200
Control: forwarded -1 https://sourceware.org/bugzilla/show_bug.cgi?id=29560
On 2022年09月05日 19:37:18 -0500, Paul Eggert wrote:
> It looks like my comment
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075#27> was incorrect,
> in that the two bugs are different bugs. glibc bug 11053 is fixed, but
> Debian bug 884075 is not fixed. Perhaps a better match for Debian bug 884075
> is glibc bug 10844.
Bug 10844 actually seems to be a different bug.
I've reported bug 29560 upstream concerning 1111111111111.
So there are 3 upstream bugs concerning bad results with backreferences
(but the regexp's are rather different is each case):
 https://sourceware.org/bugzilla/show_bug.cgi?id=10844
 (ab*){1,9}1円
 https://sourceware.org/bugzilla/show_bug.cgi?id=17356
 (.{0,1})(.{0,1})(.{0,1})3円2円1円
 https://sourceware.org/bugzilla/show_bug.cgi?id=29560
 ^(11+)1円+$|^1?$
-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Changed Bug forwarded-to-address to 'https://sourceware.org/bugzilla/show_bug.cgi?id=29560' from 'https://sourceware.org/bugzilla/show_bug.cgi?id=10844'. Request was from Vincent Lefevre <vincent@vinc17.net> to 884075-submit@bugs.debian.org. (2022年9月08日 11:57:03 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Jan 7 07:25:18 2026; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.

AltStyle によって変換されたページ (->オリジナル) /