Debian Bug report logs - #913657
grep: Regex grep on stretch is slower than jessie

version graph

Package: grep; Maintainer for grep is Anibal Monsalve Salazar <anibal@debian.org>; Source for grep is src:grep (PTS, buildd, popcon).

Reported by: Jan van den Berg <janvdberg@gmail.com>

Date: 2018年11月13日 16:45:01 UTC

Severity: normal

Tags: moreinfo, wontfix

Found in version grep/2.27-2

Fixed in version grep/3.3-1

Reply or subscribe to this bug.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep. (2018年11月13日 16:45:13 GMT) (full text, mbox, link).


Acknowledgement sent to Jan van den Berg <janvdberg@gmail.com>:
New Bug report received and forwarded. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2018年11月13日 16:45:13 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Jan van den Berg <janvdberg@gmail.com>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: grep: Regex grep on stretch is slower than jessie
Date: 2018年11月13日 17:09:58 +0100
Package: grep
Version: 2.27-2
Severity: normal
Dear Maintainer,
I just upgraded from Debian 8 to 9 and noticed that a script which I run
several times per day was really slow:
real 0m6.384s
user 0m6.288s
sys 0m0.036s
This used to take well under a second.
I dug a little deeper and noticed the problem was here:
grep 'best_bid\|fixed_' /var/www/logs/large_log_file
Playing around with the grep parameters en locale settings, and narrowed it
down to the regex, because this is way faster:
grep -F best_bid /var/www/logs/large_log_file
grep -F fixed /var/www/logs/large_log_file
So much faster in fact, that I can run 2 grep command faster than one.
real 0m0.199s
user 0m0.108s
sys 0m0.032s
However, this is strange and unexpected that after an upgrade a
unaltered grep script is slower. I dug a little deeper and it seem related to #761157
(and #18454) because of a change in de PCRE library between jessie and
stretch.
I have not seen a real fix yet (other than altering my script/grep commands), but I expect the regex library needs work, to match the previous behaviour so therefore I'm deeming it a 'bug'?
--
Jan
-- System Information:
Debian Release: 9.6
 APT prefers stable-updates
 APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 4.9.0-8-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages grep depends on:
ii dpkg 1.18.25
ii install-info 6.3.0.dfsg.1-1+b2
ii libc6 2.24-11+deb9u3
ii libpcre3 2:8.41-1+0~20180910100527.3+stretch~1.gbp97d153
grep recommends no packages.
Versions of packages grep suggests:
ii libpcre3 2:8.41-1+0~20180910100527.3+stretch~1.gbp97d153
-- no debconf information

Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep. (2018年11月14日 14:21:03 GMT) (full text, mbox, link).


Acknowledgement sent to Santiago Ruano Rincón <santiagorr@riseup.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2018年11月14日 14:21:03 GMT) (full text, mbox, link).


Message #10 received at 913657@bugs.debian.org (full text, mbox, reply):

From: Santiago Ruano Rincón <santiagorr@riseup.net>
To: Jan van den Berg <janvdberg@gmail.com>, 913657@bugs.debian.org
Subject: Re: Bug#913657: grep: Regex grep on stretch is slower than jessie
Date: 2018年11月14日 15:19:36 +0100
[Message part 1 (text/plain, inline)]
Dear Jan,
El 13/11/18 a las 17:09, Jan van den Berg escribió:
> Package: grep
> Version: 2.27-2
> Severity: normal
> 
> Dear Maintainer,
> 
> I just upgraded from Debian 8 to 9 and noticed that a script which I run
> several times per day was really slow:
> 
> real 0m6.384s
> user 0m6.288s
> sys 0m0.036s
> 
> This used to take well under a second.
> 
> I dug a little deeper and noticed the problem was here:
> 
> grep 'best_bid\|fixed_' /var/www/logs/large_log_file
> 
> Playing around with the grep parameters en locale settings, and narrowed it
> down to the regex, because this is way faster:
> 
> grep -F best_bid /var/www/logs/large_log_file
> grep -F fixed /var/www/logs/large_log_file
> 
> So much faster in fact, that I can run 2 grep command faster than one.
> 
> real 0m0.199s
> user 0m0.108s
> sys 0m0.032s
> 
> However, this is strange and unexpected that after an upgrade a
> unaltered grep script is slower. I dug a little deeper and it seem related to #761157
> (and #18454) because of a change in de PCRE library between jessie and
> stretch.
I am not sure of that, since you are not using the -P matcher that
relies on libpcre3.
> 
> I have not seen a real fix yet (other than altering my script/grep commands), but I expect the regex library needs work, to match the previous behaviour so therefore I'm deeming it a 'bug'?
...
There have been behaviour changes between the version of grep released
in jessie and stretch. See e.g. #891086.
Could you please run your script with the -a option, and also setting
LANG=C ? I suspect there is a non-textual file, a multi-byte encoding,
or a wrong encoding causing your problem. Before going any further,
please check that.
Cheers,
Santiago
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep. (2018年11月15日 11:21:06 GMT) (full text, mbox, link).


Acknowledgement sent to Jan van den Berg <janvdberg@gmail.com>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2018年11月15日 11:21:06 GMT) (full text, mbox, link).


Message #15 received at 913657@bugs.debian.org (full text, mbox, reply):

From: Jan van den Berg <janvdberg@gmail.com>
To: 913657@bugs.debian.org
Subject: Re: Bug#913657: grep: Regex grep on stretch is slower than jessie
Date: 2018年11月15日 12:18:17 +0100
[Message part 1 (text/plain, inline)]
Just a fraction better (with -a and LANG=C)
Ran it multiple times, stays just under 6 seconds now:
real 0m5.835s
user 0m5.720s
sys 0m0.060s
Still a far cry from the original / other results (under a second).
The logfile it greps is valid XML data.
Jan
Op wo 14 nov. 2018 om 15:19 schreef Santiago Ruano Rincón <
santiagorr@riseup.net>:
> Dear Jan,
>
> El 13/11/18 a las 17:09, Jan van den Berg escribió:
> > Package: grep
> > Version: 2.27-2
> > Severity: normal
> >
> > Dear Maintainer,
> >
> > I just upgraded from Debian 8 to 9 and noticed that a script which I run
> > several times per day was really slow:
> >
> > real 0m6.384s
> > user 0m6.288s
> > sys 0m0.036s
> >
> > This used to take well under a second.
> >
> > I dug a little deeper and noticed the problem was here:
> >
> > grep 'best_bid\|fixed_' /var/www/logs/large_log_file
> >
> > Playing around with the grep parameters en locale settings, and narrowed
> it
> > down to the regex, because this is way faster:
> >
> > grep -F best_bid /var/www/logs/large_log_file
> > grep -F fixed /var/www/logs/large_log_file
> >
> > So much faster in fact, that I can run 2 grep command faster than one.
> >
> > real 0m0.199s
> > user 0m0.108s
> > sys 0m0.032s
> >
> > However, this is strange and unexpected that after an upgrade a
> > unaltered grep script is slower. I dug a little deeper and it seem
> related to #761157
> > (and #18454) because of a change in de PCRE library between jessie and
> > stretch.
>
> I am not sure of that, since you are not using the -P matcher that
> relies on libpcre3.
>
> >
> > I have not seen a real fix yet (other than altering my script/grep
> commands), but I expect the regex library needs work, to match the previous
> behaviour so therefore I'm deeming it a 'bug'?
> ...
>
> There have been behaviour changes between the version of grep released
> in jessie and stretch. See e.g. #891086.
>
> Could you please run your script with the -a option, and also setting
> LANG=C ? I suspect there is a non-textual file, a multi-byte encoding,
> or a wrong encoding causing your problem. Before going any further,
> please check that.
>
> Cheers,
>
> Santiago
>
[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep. (2018年11月15日 11:57:03 GMT) (full text, mbox, link).


Acknowledgement sent to Santiago Ruano Rincón <santiagorr@riseup.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2018年11月15日 11:57:03 GMT) (full text, mbox, link).


Message #20 received at 913657@bugs.debian.org (full text, mbox, reply):

From: Santiago Ruano Rincón <santiagorr@riseup.net>
To: Jan van den Berg <janvdberg@gmail.com>, 913657@bugs.debian.org
Subject: Re: Bug#913657: grep: Regex grep on stretch is slower than jessie
Date: 2018年11月15日 12:55:20 +0100
[Message part 1 (text/plain, inline)]
Control: tag -1 + moreinfo
El 15/11/18 a las 12:18, Jan van den Berg escribió:
> Just a fraction better (with -a and LANG=C) 
> Ran it multiple times, stays just under 6 seconds now:
> real  0m5.835s
> user  0m5.720s
> sys   0m0.060s
> Still a far cry from the original / other results (under a second).
> The logfile it greps is valid XML data.
It can be valid XML, but that doesn't mean it doesn't have non-textual
characters (or invalid characters).
Could you provide a way to reproduce this?
Cheers,
S
[signature.asc (application/pgp-signature, inline)]

Added tag(s) moreinfo. Request was from Santiago Ruano Rincón <santiagorr@riseup.net> to 913657-submit@bugs.debian.org. (2018年11月15日 11:57:03 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep. (2018年11月15日 12:27:03 GMT) (full text, mbox, link).


Acknowledgement sent to Jan van den Berg <janvdberg@gmail.com>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2018年11月15日 12:27:03 GMT) (full text, mbox, link).


Message #27 received at 913657@bugs.debian.org (full text, mbox, reply):

From: Jan van den Berg <janvdberg@gmail.com>
To: Bailes Magio <santiagorr@riseup.net>
Cc: 913657@bugs.debian.org
Subject: Re: Bug#913657: grep: Regex grep on stretch is slower than jessie
Date: 2018年11月15日 13:22:19 +0100
[Message part 1 (text/plain, inline)]
I can not really share the log file in question (it has sensitive customer
data), but I did run the following test.
root@piks:/tmp# time result=`grep -a 'best_bid' ff.log`
real 0m0.026s
user 0m0.020s
sys 0m0.004s
root@piks:/tmp# time result=`grep -a 'best_bid\|fixed' ff.log`
real 0m1.881s
user 0m1.868s
sys 0m0.008s
root@piks:/tmp# wc -l ff.log
790754 ff.log
root@piks:/tmp# locale (this is normally UTF-8 but I changed it)
LANG=C
LANGUAGE=en_US:en
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=
root@piks:/tmp# cat /etc/debian_version
9.6
jan@mm1:~$ time result=`grep -a 'best_bid' ff.log`
real 0m0.039s
user 0m0.020s
sys 0m0.016s
jan@mm1:~$ time result=`grep -a 'best_bid\|fixed' ff.log`
real 0m0.173s
user 0m0.164s
sys 0m0.008s
jan@mm1:~$ wc -l ff.log
790754 ff.log
jan@mm1:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
jan@mm1:~$ cat /etc/debian_version
7.11
This is the exact same log file (I scp-ed it) and ran the commands, notice
how it takes more time on the Debian 9 machine than on the Debian 7 machine
(which is similar to my experience with Debian 8).
And the mm1 is even an older machine, with less CPU and memory.
So even if something is wrong with the logfile (or has non-textual chars),
this would not explain why grep is so much faster on older Debian versions
(and a older machine).
Jan
Op do 15 nov. 2018 om 12:55 schreef Santiago Ruano Rincón <
santiagorr@riseup.net>:
> Control: tag -1 + moreinfo
>
> El 15/11/18 a las 12:18, Jan van den Berg escribió:
> > Just a fraction better (with -a and LANG=C)
> > Ran it multiple times, stays just under 6 seconds now:
> > real 0m5.835s
> > user 0m5.720s
> > sys 0m0.060s
> > Still a far cry from the original / other results (under a second).
> > The logfile it greps is valid XML data.
>
> It can be valid XML, but that doesn't mean it doesn't have non-textual
> characters (or invalid characters).
>
> Could you provide a way to reproduce this?
>
> Cheers,
>
> S
>
[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep. (2019年4月02日 15:48:03 GMT) (full text, mbox, link).


Acknowledgement sent to BOUTELIER Sébastien <sebastien.boutelier@univ-tln.fr>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.

Your message did not contain a Subject field. They are recommended and useful because the title of a Bug is determined using this field. Please remember to include a Subject field in your messages in future.

(2019年4月02日 15:48:03 GMT) (full text, mbox, link).


Message #32 received at 913657@bugs.debian.org (full text, mbox, reply):

From: BOUTELIER Sébastien <sebastien.boutelier@univ-tln.fr>
To: 913657@bugs.debian.org
Date: Tue, 2 Apr 2019 17:39:35 +0200
Same bug here. 
With grep on any squid log file (~ 350M), there is a huge difference. 
I compiled grep 2.20 and 3.3 for stretch using pbuilder and put grep
binaries on a server: 
pbuilder build grep_3.3-1.dsc
pbuilder build grep_2.20-4.1.dsc
intall deb, scp grep binaries to server and rename it.
All binaries are using the same libs.
root@cachessl:/var/log/squid# time /var/tmp/grep33 10.21.73.68
access.log | wc -l 
1139
real 0m0,681s
user 0m0,556s
sys 0m0,112s
root@cachessl:/var/log/squid# time /var/tmp/grep220 10.21.73.68
access.log | wc -l 
1139
real 0m1,920s
user 0m1,744s
sys 0m0,168s
root@cachessl:/var/log/squid# time /var/tmp/grep227 10.21.73.68
access.log | wc -l 
1139
real 0m30,639s
user 0m30,480s
sys 0m0,144s
Same result with -a option.
-- 
BOUTELIER Sébastien
Tel: 04 94 14 29 47

Information forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep. (2019年4月03日 13:51:03 GMT) (full text, mbox, link).


Acknowledgement sent to Santiago Ruano Rincón <santiagorr@riseup.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>. (2019年4月03日 13:51:03 GMT) (full text, mbox, link).


Message #37 received at 913657@bugs.debian.org (full text, mbox, reply):

From: Santiago Ruano Rincón <santiagorr@riseup.net>
To: BOUTELIER Sébastien <sebastien.boutelier@univ-tln.fr>, 913657@bugs.debian.org
Subject: Re: Bug#913657: (no subject)
Date: Wed, 3 Apr 2019 15:46:59 +0200
[Message part 1 (text/plain, inline)]
Control: notfound -1 3.3-1
Control: tags -1 + wontfix
El 02/04/19 a las 17:39, BOUTELIER Sébastien escribió:
> Same bug here. 
> 
> With grep on any squid log file (~ 350M), there is a huge difference. 
> 
> I compiled grep 2.20 and 3.3 for stretch using pbuilder and put grep
> binaries on a server: 
> pbuilder build grep_3.3-1.dsc
> pbuilder build grep_2.20-4.1.dsc
> intall deb, scp grep binaries to server and rename it.
> 
> All binaries are using the same libs.
> 
> root@cachessl:/var/log/squid# time /var/tmp/grep33 10.21.73.68
> access.log | wc -l 
> 1139
> 
> real 0m0,681s
> user 0m0,556s
> sys 0m0,112s
> root@cachessl:/var/log/squid# time /var/tmp/grep220 10.21.73.68
> access.log | wc -l 
> 1139
> 
> real 0m1,920s
> user 0m1,744s
> sys 0m0,168s
> root@cachessl:/var/log/squid# time /var/tmp/grep227 10.21.73.68
> access.log | wc -l 
> 1139
> 
> real 0m30,639s
> user 0m30,480s
> sys 0m0,144s
> 
> Same result with -a option.
> 
> -- 
> BOUTELIER Sébastien
> Tel: 04 94 14 29 47
> 
Thanks for the info. Unfortunately, I find it difficult to solve this
issue in stretch. In general, changes in {old,}stable are restricted to
solve bugs whose severity is critical, and I don't think it is the case
here. It would be possible to upload to stretch-backport the same
version found in buster though.
Cheers,
 -- Santiago
[signature.asc (application/pgp-signature, inline)]

Added tag(s) wontfix. Request was from Santiago Ruano Rincón <santiagorr@riseup.net> to 913657-submit@bugs.debian.org. (2019年4月03日 13:51:03 GMT) (full text, mbox, link).


Marked as fixed in versions grep/3.3-1. Request was from Santiago Ruano Rincón <santiagorr@riseup.net> to control@bugs.debian.org. (2019年4月04日 08:39:04 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Jan 7 07:22:54 2026; Machine Name: berlioz

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.

AltStyle によって変換されたページ (->オリジナル) /