Package: grep; Maintainer for grep is Anibal Monsalve Salazar <anibal@debian.org>; Source for grep is src:grep (PTS, buildd, popcon).
Reported by: Jan van den Berg <janvdberg@gmail.com>
Date: 2018年11月13日 16:45:01 UTC
Severity: normal
Tags: moreinfo, wontfix
Found in version grep/2.27-2
Fixed in version grep/3.3-1
Reply or subscribe to this bug.
View this report as an mbox folder, status mbox, maintainer mbox
Report forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep.
(2018年11月13日 16:45:13 GMT) (full text, mbox, link).
Acknowledgement sent
to Jan van den Berg <janvdberg@gmail.com>:
New Bug report received and forwarded. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(2018年11月13日 16:45:13 GMT) (full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
Package: grep Version: 2.27-2 Severity: normal Dear Maintainer, I just upgraded from Debian 8 to 9 and noticed that a script which I run several times per day was really slow: real 0m6.384s user 0m6.288s sys 0m0.036s This used to take well under a second. I dug a little deeper and noticed the problem was here: grep 'best_bid\|fixed_' /var/www/logs/large_log_file Playing around with the grep parameters en locale settings, and narrowed it down to the regex, because this is way faster: grep -F best_bid /var/www/logs/large_log_file grep -F fixed /var/www/logs/large_log_file So much faster in fact, that I can run 2 grep command faster than one. real 0m0.199s user 0m0.108s sys 0m0.032s However, this is strange and unexpected that after an upgrade a unaltered grep script is slower. I dug a little deeper and it seem related to #761157 (and #18454) because of a change in de PCRE library between jessie and stretch. I have not seen a real fix yet (other than altering my script/grep commands), but I expect the regex library needs work, to match the previous behaviour so therefore I'm deeming it a 'bug'? -- Jan -- System Information: Debian Release: 9.6 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-8-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages grep depends on: ii dpkg 1.18.25 ii install-info 6.3.0.dfsg.1-1+b2 ii libc6 2.24-11+deb9u3 ii libpcre3 2:8.41-1+0~20180910100527.3+stretch~1.gbp97d153 grep recommends no packages. Versions of packages grep suggests: ii libpcre3 2:8.41-1+0~20180910100527.3+stretch~1.gbp97d153 -- no debconf information
Information forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep.
(2018年11月14日 14:21:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Santiago Ruano Rincón <santiagorr@riseup.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(2018年11月14日 14:21:03 GMT) (full text, mbox, link).
Message #10 received at 913657@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Dear Jan, El 13/11/18 a las 17:09, Jan van den Berg escribió: > Package: grep > Version: 2.27-2 > Severity: normal > > Dear Maintainer, > > I just upgraded from Debian 8 to 9 and noticed that a script which I run > several times per day was really slow: > > real 0m6.384s > user 0m6.288s > sys 0m0.036s > > This used to take well under a second. > > I dug a little deeper and noticed the problem was here: > > grep 'best_bid\|fixed_' /var/www/logs/large_log_file > > Playing around with the grep parameters en locale settings, and narrowed it > down to the regex, because this is way faster: > > grep -F best_bid /var/www/logs/large_log_file > grep -F fixed /var/www/logs/large_log_file > > So much faster in fact, that I can run 2 grep command faster than one. > > real 0m0.199s > user 0m0.108s > sys 0m0.032s > > However, this is strange and unexpected that after an upgrade a > unaltered grep script is slower. I dug a little deeper and it seem related to #761157 > (and #18454) because of a change in de PCRE library between jessie and > stretch. I am not sure of that, since you are not using the -P matcher that relies on libpcre3. > > I have not seen a real fix yet (other than altering my script/grep commands), but I expect the regex library needs work, to match the previous behaviour so therefore I'm deeming it a 'bug'? ... There have been behaviour changes between the version of grep released in jessie and stretch. See e.g. #891086. Could you please run your script with the -a option, and also setting LANG=C ? I suspect there is a non-textual file, a multi-byte encoding, or a wrong encoding causing your problem. Before going any further, please check that. Cheers, Santiago
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep.
(2018年11月15日 11:21:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Jan van den Berg <janvdberg@gmail.com>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(2018年11月15日 11:21:06 GMT) (full text, mbox, link).
Message #15 received at 913657@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Just a fraction better (with -a and LANG=C) Ran it multiple times, stays just under 6 seconds now: real 0m5.835s user 0m5.720s sys 0m0.060s Still a far cry from the original / other results (under a second). The logfile it greps is valid XML data. Jan Op wo 14 nov. 2018 om 15:19 schreef Santiago Ruano Rincón < santiagorr@riseup.net>: > Dear Jan, > > El 13/11/18 a las 17:09, Jan van den Berg escribió: > > Package: grep > > Version: 2.27-2 > > Severity: normal > > > > Dear Maintainer, > > > > I just upgraded from Debian 8 to 9 and noticed that a script which I run > > several times per day was really slow: > > > > real 0m6.384s > > user 0m6.288s > > sys 0m0.036s > > > > This used to take well under a second. > > > > I dug a little deeper and noticed the problem was here: > > > > grep 'best_bid\|fixed_' /var/www/logs/large_log_file > > > > Playing around with the grep parameters en locale settings, and narrowed > it > > down to the regex, because this is way faster: > > > > grep -F best_bid /var/www/logs/large_log_file > > grep -F fixed /var/www/logs/large_log_file > > > > So much faster in fact, that I can run 2 grep command faster than one. > > > > real 0m0.199s > > user 0m0.108s > > sys 0m0.032s > > > > However, this is strange and unexpected that after an upgrade a > > unaltered grep script is slower. I dug a little deeper and it seem > related to #761157 > > (and #18454) because of a change in de PCRE library between jessie and > > stretch. > > I am not sure of that, since you are not using the -P matcher that > relies on libpcre3. > > > > > I have not seen a real fix yet (other than altering my script/grep > commands), but I expect the regex library needs work, to match the previous > behaviour so therefore I'm deeming it a 'bug'? > ... > > There have been behaviour changes between the version of grep released > in jessie and stretch. See e.g. #891086. > > Could you please run your script with the -a option, and also setting > LANG=C ? I suspect there is a non-textual file, a multi-byte encoding, > or a wrong encoding causing your problem. Before going any further, > please check that. > > Cheers, > > Santiago >
[Message part 2 (text/html, inline)]
Information forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep.
(2018年11月15日 11:57:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Santiago Ruano Rincón <santiagorr@riseup.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(2018年11月15日 11:57:03 GMT) (full text, mbox, link).
Message #20 received at 913657@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Control: tag -1 + moreinfo El 15/11/18 a las 12:18, Jan van den Berg escribió: > Just a fraction better (with -a and LANG=C) > Ran it multiple times, stays just under 6 seconds now: > real 0m5.835s > user 0m5.720s > sys 0m0.060s > Still a far cry from the original / other results (under a second). > The logfile it greps is valid XML data. It can be valid XML, but that doesn't mean it doesn't have non-textual characters (or invalid characters). Could you provide a way to reproduce this? Cheers, S
[signature.asc (application/pgp-signature, inline)]
Added tag(s) moreinfo.
Request was from Santiago Ruano Rincón <santiagorr@riseup.net>
to 913657-submit@bugs.debian.org.
(2018年11月15日 11:57:03 GMT) (full text, mbox, link).
Information forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep.
(2018年11月15日 12:27:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Jan van den Berg <janvdberg@gmail.com>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(2018年11月15日 12:27:03 GMT) (full text, mbox, link).
Message #27 received at 913657@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
I can not really share the log file in question (it has sensitive customer data), but I did run the following test. root@piks:/tmp# time result=`grep -a 'best_bid' ff.log` real 0m0.026s user 0m0.020s sys 0m0.004s root@piks:/tmp# time result=`grep -a 'best_bid\|fixed' ff.log` real 0m1.881s user 0m1.868s sys 0m0.008s root@piks:/tmp# wc -l ff.log 790754 ff.log root@piks:/tmp# locale (this is normally UTF-8 but I changed it) LANG=C LANGUAGE=en_US:en LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL= root@piks:/tmp# cat /etc/debian_version 9.6 jan@mm1:~$ time result=`grep -a 'best_bid' ff.log` real 0m0.039s user 0m0.020s sys 0m0.016s jan@mm1:~$ time result=`grep -a 'best_bid\|fixed' ff.log` real 0m0.173s user 0m0.164s sys 0m0.008s jan@mm1:~$ wc -l ff.log 790754 ff.log jan@mm1:~$ locale LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= jan@mm1:~$ cat /etc/debian_version 7.11 This is the exact same log file (I scp-ed it) and ran the commands, notice how it takes more time on the Debian 9 machine than on the Debian 7 machine (which is similar to my experience with Debian 8). And the mm1 is even an older machine, with less CPU and memory. So even if something is wrong with the logfile (or has non-textual chars), this would not explain why grep is so much faster on older Debian versions (and a older machine). Jan Op do 15 nov. 2018 om 12:55 schreef Santiago Ruano Rincón < santiagorr@riseup.net>: > Control: tag -1 + moreinfo > > El 15/11/18 a las 12:18, Jan van den Berg escribió: > > Just a fraction better (with -a and LANG=C) > > Ran it multiple times, stays just under 6 seconds now: > > real 0m5.835s > > user 0m5.720s > > sys 0m0.060s > > Still a far cry from the original / other results (under a second). > > The logfile it greps is valid XML data. > > It can be valid XML, but that doesn't mean it doesn't have non-textual > characters (or invalid characters). > > Could you provide a way to reproduce this? > > Cheers, > > S >
[Message part 2 (text/html, inline)]
Information forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep.
(2019年4月02日 15:48:03 GMT) (full text, mbox, link).
Acknowledgement sent
to BOUTELIER Sébastien <sebastien.boutelier@univ-tln.fr>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
Your message did not contain a Subject field. They are recommended and useful because the title of a Bug is determined using this field. Please remember to include a Subject field in your messages in future.
Message #32 received at 913657@bugs.debian.org (full text, mbox, reply):
Same bug here. With grep on any squid log file (~ 350M), there is a huge difference. I compiled grep 2.20 and 3.3 for stretch using pbuilder and put grep binaries on a server: pbuilder build grep_3.3-1.dsc pbuilder build grep_2.20-4.1.dsc intall deb, scp grep binaries to server and rename it. All binaries are using the same libs. root@cachessl:/var/log/squid# time /var/tmp/grep33 10.21.73.68 access.log | wc -l 1139 real 0m0,681s user 0m0,556s sys 0m0,112s root@cachessl:/var/log/squid# time /var/tmp/grep220 10.21.73.68 access.log | wc -l 1139 real 0m1,920s user 0m1,744s sys 0m0,168s root@cachessl:/var/log/squid# time /var/tmp/grep227 10.21.73.68 access.log | wc -l 1139 real 0m30,639s user 0m30,480s sys 0m0,144s Same result with -a option. -- BOUTELIER Sébastien Tel: 04 94 14 29 47
Information forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#913657; Package grep.
(2019年4月03日 13:51:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Santiago Ruano Rincón <santiagorr@riseup.net>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(2019年4月03日 13:51:03 GMT) (full text, mbox, link).
Message #37 received at 913657@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Control: notfound -1 3.3-1
Control: tags -1 + wontfix
El 02/04/19 a las 17:39, BOUTELIER Sébastien escribió:
> Same bug here.
>
> With grep on any squid log file (~ 350M), there is a huge difference.
>
> I compiled grep 2.20 and 3.3 for stretch using pbuilder and put grep
> binaries on a server:
> pbuilder build grep_3.3-1.dsc
> pbuilder build grep_2.20-4.1.dsc
> intall deb, scp grep binaries to server and rename it.
>
> All binaries are using the same libs.
>
> root@cachessl:/var/log/squid# time /var/tmp/grep33 10.21.73.68
> access.log | wc -l
> 1139
>
> real 0m0,681s
> user 0m0,556s
> sys 0m0,112s
> root@cachessl:/var/log/squid# time /var/tmp/grep220 10.21.73.68
> access.log | wc -l
> 1139
>
> real 0m1,920s
> user 0m1,744s
> sys 0m0,168s
> root@cachessl:/var/log/squid# time /var/tmp/grep227 10.21.73.68
> access.log | wc -l
> 1139
>
> real 0m30,639s
> user 0m30,480s
> sys 0m0,144s
>
> Same result with -a option.
>
> --
> BOUTELIER Sébastien
> Tel: 04 94 14 29 47
>
Thanks for the info. Unfortunately, I find it difficult to solve this
issue in stretch. In general, changes in {old,}stable are restricted to
solve bugs whose severity is critical, and I don't think it is the case
here. It would be possible to upload to stretch-backport the same
version found in buster though.
Cheers,
-- Santiago
[signature.asc (application/pgp-signature, inline)]
Added tag(s) wontfix.
Request was from Santiago Ruano Rincón <santiagorr@riseup.net>
to 913657-submit@bugs.debian.org.
(2019年4月03日 13:51:03 GMT) (full text, mbox, link).
Marked as fixed in versions grep/3.3-1.
Request was from Santiago Ruano Rincón <santiagorr@riseup.net>
to control@bugs.debian.org.
(2019年4月04日 08:39:04 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.