GNU bug report logs - #32704
Can grep search for a line feed and a null character at the same time?

Previous Next

Package: grep;

Reported by: 21naown <at> gmail.com

Date: 2018年9月11日 16:27:01 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: Eric Blake <eblake <at> redhat.com>
To: 21naown <at> gmail.com, 32704 <at> debbugs.gnu.org
Subject: bug#32704: Can grep search for a line feed and a null character at the same time?
Date: 2018年9月11日 12:03:17 -0500
On 9/11/18 11:25 AM, 21naown <at> gmail.com wrote:
> Hello,
> 
> 
> I found someone who asked the same question on "Stack Overflow", still 
> unanswered, but this person did not ask it on the mailing list.
> 
> Here are the details of the question which are nearly similar to my case:
> https://stackoverflow.com/questions/50295772/can-grep-search-for-a-line-feed-and-a-null-character-at-the-same-time 
Per 'info grep':
 15. How can I match across lines?
 Standard grep cannot do this, as it is fundamentally line-based.
 Therefore, merely using the ‘[:space:]’ character class does not
 match newlines in the way you might expect.
 With the GNU ‘grep’ option ‘-z’ (‘--null-data’), each input and
 output "line" is null-terminated; *note Other Options::. Thus, you
 can match newlines in the input, but typically if there is a match
 the entire input is output, so this usage is often combined with
 output-suppressing options like ‘-q’, e.g.:
 printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
 If this does not suffice, you can transform the input before giving
 it to ‘grep’, or turn to ‘awk’, ‘sed’, ‘perl’, or many other
 utilities that are designed to operate across lines.
Grep does not have the ability to match hex or octal backslash 
sequences, and a literal newline in the pattern is taken as a separation 
of patterns. Use of [:space:] to include newline alongside other things 
sort of works. But maybe we really do have a bug - when -z is in 
effect, I'd expect NUL, rather than newline, to be the byte that 
separates separate patterns in the pattern argument (and thus expressing 
a literal newline, as in shells that understand $'\n$', to be viable for 
writing a single pattern that matches exactly one newline byte at the 
end of a NUL-separated record).
That said, your EASIEST approach is to use iconv to recode your file out 
of UTF-16 (which is NOT conducive to multi-byte processing), into 
something friendlier like UTF-8, and then use grep on the converted file.
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org

This bug report was last modified 5 years and 110 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

AltStyle によって変換されたページ (->オリジナル) /