Below is the contents of my file,
<A>
<number>100</number>
<name>Word1</name>
</A>
<A>
<number>101</number>
<name> Word2</name>
</A>
if I grep for Word1
, I'm trying to see the output as below,
<A>
<number>100</number>
<name>Word1</name>
</A>
if I grep for Word2
, I'm trying to see the output as below,
<A>
<number>101</number>
<name>Word2</name>
</A>
Someone could help with this please ?
2 Answers 2
If this is part of a well formed XML document you can extract the required part with an XML parser.
To satisfy the well formed requirement, I've wrapped your XML fragment with <root>
and </root>
.
xmlstarlet sel -t -c '//A[name="Word1"]' -n file.xml
If you cannot satisfy this directly, you can wrap it explicitly
( echo '<root>'; cat file.xml; echo '</root>' ) | xmlstarlet sel -t -c '//A[name="Word1"]' -n
In either case, the output is this:
<A>
<number>100</number>
<name>Word1</name>
</A>
-
Note that it wouldn't work for
Word2
because of the extra space before it in the sample. Note that it still outputs an empty line if there's no match.xmlstarlet sel -t -m '//A[name="Word1"]' -c . -n
orxmlstarlet sel -t -m '//A[contains(name,"Word2")]' -c . -n
may be better.Stéphane Chazelas– Stéphane Chazelas2017年11月21日 17:01:29 +00:00Commented Nov 21, 2017 at 17:01 -
@StéphaneChazelas I was hoping that leading space was a typo. (Yes, yes I know...) BTW why
-m '//...contains()' -c .
rather than-c '//...contains()
?Chris Davies– Chris Davies2017年11月21日 17:06:45 +00:00Commented Nov 21, 2017 at 17:06 -
See also
xmlstarlet sel -t -m '//name[contains(.,"Word2")]' -c .. -n
for the parent ofname
nodes that containWord2
.Stéphane Chazelas– Stéphane Chazelas2017年11月21日 17:12:58 +00:00Commented Nov 21, 2017 at 17:12 -
1
-c //... -n
outputs an empty line if there's no match.-m //... -c . -n
doesn't.Stéphane Chazelas– Stéphane Chazelas2017年11月22日 08:32:19 +00:00Commented Nov 22, 2017 at 8:32
With pcregrep
:
<file.xml pcregrep -Mo '(?s)<A>(?:.(?!</A>))*Word1.*?</A>'
With GNU grep
:
<file.xml grep -zPo '(?s)<A>(?:.(?!</A>))*Word1.*?</A>' | tr '0円' '\n'
(though that means the whole file is loaded in memory and assumes it doesn't contain NUL bytes).
Some PCRE operators:
(?s)
turns on thes
flag (.
matches even line delimiters).(?!</A>)
any character provided it's not at the start of</A>
..*?
non-greedy version of.*
(:...)
just grouping.
It's fooled by things like <![CDATA[</A>]]>
or wouldn't find a Word2
expressed as <![CDATA[W]]>ord2>
or Word2
for which you'd need a XML parser. But then a XML parser would need valid XML input which your sample is not unless you enclose it in a top-level element, would need to read the file in full (but then again that's generally your lot when working with that format) and would potentially transform the content (expand the <![CDATA
and some &...;
sequences). And an xpath expression would make it difficult to find those Word1
anywhere including in comments or XML tags or attributes.
grep
or a related tool. Also, are you limited to using features ofgrep
that are required by POSIX, or are you willing to use extended features provided by yourgrep
implementation? If you're willing to use extended features then what OS is this and what doesgrep -V
show? Please edit with details.