2

So, it took me ages, but I finally learned to think in terms of regular expressions, thanks to using them in kwrite.

But I still don't know how to translate that knowledge to grep. I love my grep, when I know what I'm doing with it, but the manual has always given me a headache.

I'd like to match stuff like the following lines:

CAPITALSFOLLOWING anewline.
CAPI
TALSFOLL owing
ANEW line.

That is, lines that begin with two or more capital letters. But I can't figure out how.

In kwrite, I would match these lines using:

\n[A-Z][A-Z]+

But grep... hmm. I have a feeling like it's something like:

me@ROOROO:~/$ grep "^[A-Z]something" filename

but

me@ROOROO:~/$ grep "^[A-Z][A-Z]+" filename

doesn't work (returns an empty file). A google search for the term 'grep match one or more occurrence' lead me to believe that

me@ROOROO:~/$ grep "^[A-Z][A-Z]*" filename

was the right syntax. But, alas, that doesn't do the trick.

asked Feb 10, 2012 at 17:41
1
  • In the old days, each tool had its own regexp syntax. By default, grep uses its traditional syntax; use grep -E to have a more habitual syntax where a backslash followed by a non-alphanumeric character is never special. Commented Feb 10, 2012 at 23:47

3 Answers 3

8

You're using the right syntax in your first example; the problem is + is only considered special when using "extended" regular expressions. From the man page of the GNU implementation of grep:

Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

(\?, \+, and \| are non-standard GNU extensions though).

So, you either need to escape the + (assuming GNU grep or compatible):

$ grep "^[A-Z][A-Z]\+" filename

Use the standard \{1,\} equivalent of GNU's \+:

$ grep '^[A-Z][A-Z]\{1,\}' filename

or even here:

$ grep '^[A-Z]\{2,\}' filename

Or turn on extended regular expressions, by passing grep the -E flag or just running egrep (egrep is the command that introduced those extended regular expressions in the late 70s):

$ grep -E "^[A-Z][A-Z]+" filename
$ egrep "^[A-Z][A-Z]+" filename

In any case, all those would be functionally equivalent to:

$ grep '^[A-Z][A-Z]' filename

So you don't even need the + operator.

In your other example you tried:

$ grep "^[A-Z][A-Z]*" filename

* works in basic regular expressions, but it matches 0 or more times, not 1 or more. The solution in your answer works because it says "match a capital, then another capital, then 0 or more capitals". The method in the question says "match a capital, then 1 or more capitals", which is the same. You can also use {min,max} to specify exactly how many you want, and if you leave out max it allows any number (this also requires extended regular expressions):

$ egrep "^[A-Z]{2,}"

(as a history note, egrep didn't support {min,max} initially (and still doesn't in Solaris 11 /bin/egrep for instance). \{min,max\} support was added to grep before {min,max} was added to egrep (which in the case of egrep did break backward compatibility)).

answered Feb 10, 2012 at 18:12
0
1

You just need to add an extra [A-Z]. So, it's

me@ROOROO:~/$ grep "^[A-Z][A-Z][A-Z]*" filename
answered Feb 10, 2012 at 17:42
0

Looks like you need a regexp support from perl. Form man grep:

 -P, --perl-regexp
 Interpret PATTERN as a Perl regular expression. This is highly experimental
 and grep -P may warn of unimplemented features.

So grep -P "^[A-Z][A-Z]+" could be more helpful.

answered Feb 10, 2012 at 18:05

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.