Package: grep;
Reported by: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Date: 2014年2月25日 07:33:01 UTC
Severity: wishlist
To reply to this bug, email your comments to 16871 AT debbugs.gnu.org.
the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-grep <at> gnu.org:bug#16871; Package grep.
(2014年2月25日 07:33:02 GMT) Full text and rfc822 format available.Stephane Chazelas <stephane.chazelas <at> gmail.com>:bug-grep <at> gnu.org.
(2014年2月25日 07:33:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Stephane Chazelas <stephane.chazelas <at> gmail.com> To: bug-grep <at> gnu.org Subject: problems about matching newline (with -z) Date: 2014年2月25日 07:32:18 +0000
The doc has a confusing statement:
> 15. How can I match across lines?
>
> Standard grep cannot do this, as it is fundamentally line-based.
> Therefore, merely using the '[:space:]' character class does not
> match newlines in the way you might expect. However, if your grep
> is compiled with Perl patterns enabled, the Perl 's' modifier
> (which makes '.' match newlines) can be used:
>
> printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
>
> With the GNU 'grep' option '-z' (*note File and Directory
> Selection::), the input is terminated by null bytes. Thus, you can
> match newlines in the input, but the output will be the whole file,
> so this is really only useful to determine if the pattern is
> present:
>
> printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
>
> Failing either of those options, you need to transform the input
> before giving it to 'grep', or turn to 'awk', 'sed', 'perl', or
> many other utilities that are designed to operate across lines.
printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
Will never match as it's line-based even with -P. -P doesn't
help here, it makes it harder as you need that (?s).
printf 'foo\nbar\n0円' | grep -z 'foo.*bar'
would match.
Same confusion in tests/pcre:
> #! /bin/sh
> # Ensure that with -P, \s*$ matches a newline.
> #
> # Copyright (C) 2001, 2006, 2009-2014 Free Software Foundation, Inc.
> #
> # Copying and distribution of this file, with or without modification,
> # are permitted in any medium without royalty provided the copyright
> # notice and this notice are preserved.
>
> . "${srcdir=.}/init.sh"; path_prepend_ ../src
> require_pcre_
>
> fail=0
>
> # See CVS revision 1.32 of "src/search.c".
> echo | grep -P '\s*$' || fail=1
>
> Exit $fail
'\s*$' doesn't match a newline, but an empty string.
You need echo | grep -zP '\s' to match the newline.
Also:
We can match a newline with grep -zP 'a\nb' (or '\x0a' or '012円'
or '[\n]'...) but not easily without -P. Same for NUL
characters.
Without -P, the only way I could think of was with
[^0円-011円013円-377円], but that would only work for single-byte
locales, and you can't pass a nul character on the command line,
so it would have to be with -f but:
$ printf 'a\nb0円' | LC_ALL=C grep -zf <(LC_ALL=C printf 'a[^0円-011円013円-377円]b')
zsh: done printf 'a\nb0円' |
zsh: segmentation fault LC_ALL=C grep -zf <(LC_ALL=C printf 'a[^0円-011円013円-377円]b')
Having said that:
grep -z $'a[^01円-011円013円-0377円]b'
would work (in single-byte locales) since nul is not in the
input since it's the delimiter.
and grep -a $'[^01円-0377円]' can match nul (in single-byte
locales).
But it would be handly to be able to do the same as with -P.
--
Stephane
bug-grep <at> gnu.org:bug#16871; Package grep.
(2014年2月25日 11:34:01 GMT) Full text and rfc822 format available.Message #8 received at 16871 <at> debbugs.gnu.org (full text, mbox):
From: Stephane Chazelas <stephane.chazelas <at> gmail.com> To: 16871 <at> debbugs.gnu.org Subject: Re: bug#16871: Acknowledgement (problems about matching newline (with -z)) Date: 2014年2月25日 11:32:43 +0000
Also: $ printf 'a\nb0円' | grep -z 'a$' $ printf 'a\nb0円' | grep -zP 'a$' a b $ printf 'a\nb0円' | grep -zxP a a b Why use PCRE_MULTILINE here?
bug-grep <at> gnu.org:bug#16871; Package grep.
(2014年4月25日 04:28:02 GMT) Full text and rfc822 format available.Message #11 received at 16871 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Stephane Chazelas <stephane.chazelas <at> gmail.com>, 16871 <at> debbugs.gnu.org Subject: Re: bug#16871: problems about matching newline (with -z) Date: 2014年4月24日 21:27:38 -0700
[Message part 1 (text/plain, inline)]
Stephane Chazelas wrote: > The doc has a confusing statement ... Same confusion in tests/pcre: Thanks, I installed the attached patch to fix those. > We can match a newline with grep -zP 'a\nb' (or '\x0a' or '012円' > or '[\n]'...) but not easily without -P. Same for NUL > characters. Yes, that's a downside of the POSIX notation, and it'd be nice to extend POSIX to allow easy matching for newlines and/or null bytes. I'll mark this bug report as a wishlist bug.
[0001-misc-fix-doc-and-test-bugs-re-grep-z.patch (text/plain, attachment)]
Paul Eggert <eggert <at> cs.ucla.edu>
to control <at> debbugs.gnu.org.
(2014年4月25日 04:29:01 GMT) Full text and rfc822 format available.bug-grep <at> gnu.org:bug#16871; Package grep.
(2016年11月18日 17:41:01 GMT) Full text and rfc822 format available.Message #16 received at 16871 <at> debbugs.gnu.org (full text, mbox):
From: Stephane Chazelas <stephane.chazelas <at> gmail.com> To: 16871 <at> debbugs.gnu.org Subject: doc/test confusions with grep -P Date: 2016年11月18日 17:40:44 +0000
For the record, the doc/test confusion was fixed by commit b73296ace186451b096b075461634c153d1fa525 http://git.savannah.gnu.org/cgit/grep.git/commit/?id=b73296ace186451b096b075461634c153d1fa525 See also https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22655#47 and below about PCRE_MULTILINE.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.