GNU bug report logs - #16871
problems about matching newline (with -z)

Previous Next

Package: grep;

Reported by: Stephane Chazelas <stephane.chazelas <at> gmail.com>

Date: 2014年2月25日 07:33:01 UTC

Severity: wishlist

To reply to this bug, email your comments to 16871 AT debbugs.gnu.org.

the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#16871; Package grep. (2014年2月25日 07:33:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stephane Chazelas <stephane.chazelas <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (2014年2月25日 07:33:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: problems about matching newline (with -z)
Date: 2014年2月25日 07:32:18 +0000
The doc has a confusing statement:
> 15. How can I match across lines?
>
> Standard grep cannot do this, as it is fundamentally line-based.
> Therefore, merely using the '[:space:]' character class does not
> match newlines in the way you might expect. However, if your grep
> is compiled with Perl patterns enabled, the Perl 's' modifier
> (which makes '.' match newlines) can be used:
>
> printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
>
> With the GNU 'grep' option '-z' (*note File and Directory
> Selection::), the input is terminated by null bytes. Thus, you can
> match newlines in the input, but the output will be the whole file,
> so this is really only useful to determine if the pattern is
> present:
>
> printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
>
> Failing either of those options, you need to transform the input
> before giving it to 'grep', or turn to 'awk', 'sed', 'perl', or
> many other utilities that are designed to operate across lines.
printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
Will never match as it's line-based even with -P. -P doesn't
help here, it makes it harder as you need that (?s).
printf 'foo\nbar\n0円' | grep -z 'foo.*bar'
would match.
Same confusion in tests/pcre:
> #! /bin/sh
> # Ensure that with -P, \s*$ matches a newline.
> #
> # Copyright (C) 2001, 2006, 2009-2014 Free Software Foundation, Inc.
> #
> # Copying and distribution of this file, with or without modification,
> # are permitted in any medium without royalty provided the copyright
> # notice and this notice are preserved.
> 
> . "${srcdir=.}/init.sh"; path_prepend_ ../src
> require_pcre_
> 
> fail=0
> 
> # See CVS revision 1.32 of "src/search.c".
> echo | grep -P '\s*$' || fail=1
> 
> Exit $fail
'\s*$' doesn't match a newline, but an empty string.
You need echo | grep -zP '\s' to match the newline.
Also:
We can match a newline with grep -zP 'a\nb' (or '\x0a' or '012円'
or '[\n]'...) but not easily without -P. Same for NUL
characters.
Without -P, the only way I could think of was with
[^0円-011円013円-377円], but that would only work for single-byte
locales, and you can't pass a nul character on the command line,
so it would have to be with -f but:
$ printf 'a\nb0円' | LC_ALL=C grep -zf <(LC_ALL=C printf 'a[^0円-011円013円-377円]b')
zsh: done printf 'a\nb0円' |
zsh: segmentation fault LC_ALL=C grep -zf <(LC_ALL=C printf 'a[^0円-011円013円-377円]b')
Having said that:
grep -z $'a[^01円-011円013円-0377円]b'
would work (in single-byte locales) since nul is not in the
input since it's the delimiter.
and grep -a $'[^01円-0377円]' can match nul (in single-byte
locales).
But it would be handly to be able to do the same as with -P.
-- 
Stephane

Information forwarded to bug-grep <at> gnu.org:
bug#16871; Package grep. (2014年2月25日 11:34:01 GMT) Full text and rfc822 format available.

Message #8 received at 16871 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: 16871 <at> debbugs.gnu.org
Subject: Re: bug#16871: Acknowledgement (problems about matching newline
 (with -z))
Date: 2014年2月25日 11:32:43 +0000
Also:
$ printf 'a\nb0円' | grep -z 'a$'
$ printf 'a\nb0円' | grep -zP 'a$'
a
b
$ printf 'a\nb0円' | grep -zxP a
a
b
Why use PCRE_MULTILINE here?

Information forwarded to bug-grep <at> gnu.org:
bug#16871; Package grep. (2014年4月25日 04:28:02 GMT) Full text and rfc822 format available.

Message #11 received at 16871 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>, 
 16871 <at> debbugs.gnu.org
Subject: Re: bug#16871: problems about matching newline (with -z)
Date: 2014年4月24日 21:27:38 -0700
[Message part 1 (text/plain, inline)]
Stephane Chazelas wrote:
> The doc has a confusing statement ... Same confusion in tests/pcre:
Thanks, I installed the attached patch to fix those.
> We can match a newline with grep -zP 'a\nb' (or '\x0a' or '012円'
> or '[\n]'...) but not easily without -P. Same for NUL
> characters.
Yes, that's a downside of the POSIX notation, and it'd be nice to extend 
POSIX to allow easy matching for newlines and/or null bytes. I'll mark 
this bug report as a wishlist bug.
[0001-misc-fix-doc-and-test-bugs-re-grep-z.patch (text/plain, attachment)]

Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (2014年4月25日 04:29:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#16871; Package grep. (2016年11月18日 17:41:01 GMT) Full text and rfc822 format available.

Message #16 received at 16871 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: 16871 <at> debbugs.gnu.org
Subject: doc/test confusions with grep -P
Date: 2016年11月18日 17:40:44 +0000
For the record, the doc/test confusion was fixed by commit
b73296ace186451b096b075461634c153d1fa525
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=b73296ace186451b096b075461634c153d1fa525
See also https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22655#47
and below about PCRE_MULTILINE.

This bug report was last modified 9 years and 53 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

AltStyle によって変換されたページ (->オリジナル) /