4
\$\begingroup\$

I'm learning regex with sed to extract the last field from file named "test". The method I'm trying gives desired output. Please suggest if this method Im trying is effective way of doing it. Also when should we use "-e" option with sed (please give an example — I couldn't find examples)

~# ] cat test
example.com. 4 IN NS b.iana-servers.net.
50times.com. 21556 IN NS ns1.50times.com.
example.com. 4 IN NS a.iana-servers.net.
~# ] cat test | sed -r 's/^[[:alnum:]]*.[[:alnum:]]*.?[a-z]*.[[:blank:]]+[0-9]+[[:blank:]]+IN[[:blank:]]+[A-Z]+[[:blank:]]+//g' | sed -r 's/\.*.$//'
b.iana-servers.net
ns1.50times.com
a.iana-servers.net
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jul 16, 2015 at 7:22
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

When processing tabular data in columns, awk is often a more appropriate tool to use. The equivalent command would be

awk '{ sub("\.$", "", $NF); print $NF }' test

... which I think is more readable.

Explanation:

  • NF is the number of fields: for this text, 5.
  • $NF is the content of the last (5th) field.
  • sub("\.$", "", $NF) strips the trailing dot from the last field.
  • { commands } executes the commands for every line in the file.
answered Jul 16, 2015 at 9:04
\$\endgroup\$
0
2
\$\begingroup\$

From the GNU sed documentation:

If no -e, -f, --expression, or --file options are given on the command-line, then the first non-option argument on the command line is taken to be the script to be executed.

Your two sed commands each has one non-option argument, which gets treated as the script. It would be better practice to always explicitly put a -e in front of the script. Then you can write the command this way, as just one command instead of a pipeline:

sed -r -e 's/^[[:alnum:]]*.[[:alnum:]]*.?[a-z]*.[[:blank:]]+[0-9]+[[:blank:]]+IN[[:blank:]]+[A-Z]+[[:blank:]]+//g' \
 -e 's/\.*.$//' test

It looks like you are attempting to craft the first regex to validate each column, checking that the first column looks like a domain ending with a dot ([[:alnum:]]*.[[:alnum:]]*.?[a-z]*.), the second column looks like an integer ([0-9]+), the third column is IN, and the fourth columns is a record type ([A-Z]+).

The regex for the first column probably doesn't work the way you expect. Each . means "match any character"; it does not mean "match a dot character". To match a dot character, you would write \. instead.


If you just want to extract the last column without validation, and suppressing the trailing dot, you could just write instead:

sed -e 's/.*[ \t]\([^ \t]*\)\.$/1円/' test

[^ \t]*\.$ should match the last column ("all non-space characters followed by a dot at the end of the line"). The parentheses capture everything except the trailing dot. 1円 is a backreference referring to the first and only captured group.

I've opted to use [ \t] instead of [[:blank:]] because the latter is an extended regular expression, which is a non-standard GNU extension, and the -r option makes your command less portable.

answered Jul 16, 2015 at 8:38
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.