Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jun 5, 2024. It is now read-only.

Commit 6aeebc7

Browse files
added more clarifications and links
1 parent 3151e05 commit 6aeebc7

File tree

1 file changed

+59
-39
lines changed

1 file changed

+59
-39
lines changed

‎gnu_grep.md

Lines changed: 59 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -361,7 +361,7 @@ Sugar is sweet,
361361
```
362362
363363
* If there are multiple non-adjacent matching segments, by default `grep` adds a line `--` to separate them
364-
* non-adjacent here implies that segments are separated by at least one line
364+
* non-adjacent here implies that segments are separated by at least one line in input data
365365
366366
```bash
367367
$ seq 29 | grep -A1 '3'
@@ -792,7 +792,7 @@ $ printf 'foo\cbar' | grep -o '\\c'
792792
793793
* The `-w` option works well to match whole words. But what about matching only start or end of words?
794794
* Anchors `\<` and `\>` will match start/end positions of a word
795-
* `\b` can also be used instead of `\<` and `\>` which matches either edge of a word
795+
* `\b` can also be used instead of `\<` and `\>` which matches both edges of a word
796796
797797
```bash
798798
$ printf 'spar\npar\npart\napparent\n'
@@ -843,6 +843,24 @@ part
843843
apparent
844844
```
845845
846+
* the word boundary escape sequences differ slightly from `-w` option
847+
848+
```bash
849+
$ # this fails because there is no word boundary between space and +
850+
$ echo '2 +3 = 5' | grep '\b+3\b'
851+
$ # this works as -w only ensures that there are no surrounding word characters
852+
$ echo '2 +3 = 5' | grep -w '+3'
853+
2 +3 = 5
854+
855+
$ # doesn't work as , isn't at start of word boundary
856+
$ echo 'hi, 2 one' | grep '\<, 2\>'
857+
$ # won't match as there are word characters before ,
858+
$ echo 'hi, 2 one' | grep -w ', 2'
859+
$ # works as \b matches both edges and , is at end of word after i
860+
$ echo 'hi, 2 one' | grep '\b, 2\b'
861+
hi, 2 one
862+
```
863+
846864
<br>
847865
848866
#### <a name="alternation"></a>Alternation
@@ -933,7 +951,7 @@ $ echo '1 & 2' | grep -o '.'
933951
934952
<br>
935953
936-
#### <a name="quantifiers"></a>Quantifiers
954+
#### <a name="quantifiers"></a>Greedy Quantifiers
937955
938956
Defines how many times a character (simplified for now) should be matched
939957
@@ -963,6 +981,8 @@ act
963981
964982
* `*` will try to match 0 or more times
965983
* There is no upper limit and `*` will try to match as many times as possible
984+
* if matching maximum times results in overall regex failing, then next best count is chosen until overall regex passes
985+
* if there are multiple quantifiers, left-most quantifier gets precedence
966986
967987
```bash
968988
$ echo 'abbbc' | grep -o 'b*'
@@ -990,7 +1010,7 @@ $ # matching overall expression gets preference
9901010
$ echo 'car bat cod map scat dot abacus' | grep -o 'c.*at'
9911011
car bat cod map scat
9921012
993-
$ # precendence is left to right in case of multiple matches
1013+
$ # precedence is left to right in case of multiple matches
9941014
$ echo 'car bat cod map scat dot abacus' | grep -o 'b.*m'
9951015
bat cod m
9961016
$ echo 'car bat cod map scat dot abacus' | grep -o 'b.*m*'
@@ -1015,30 +1035,30 @@ ac
10151035
abbc
10161036
```
10171037
1018-
* For more precise control on number of times to match, `{}` (`\{\}`for BRE) is useful
1019-
* It can take one of four forms, `{n}`, `{n,m}`, `{,m}` and `{n,}`
1020-
1038+
* For more precise control on number of times to match, `{}` is useful
1039+
* use `\{\}`for BRE
1040+
* It can take one of four forms, `{m,n}`, `{,n}`, `{m,}` and `{n}`
10211041
10221042
```bash
1023-
$ # {n} - exactly n times
1024-
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{2}c'
1025-
abbc
1026-
1027-
$ # {n,m} - n to m, including both n and m
1043+
$ # {m,n} - m to n, including both m and n
10281044
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{1,2}c'
10291045
abc
10301046
abbc
10311047
1032-
$ # {,m} - 0 to m times
1048+
$ # {,n} - 0 to n times
10331049
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{,2}c'
10341050
ac
10351051
abc
10361052
abbc
10371053
1038-
$ # {n,} - at least n times
1054+
$ # {m,} - at least m times
10391055
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{2,}c'
10401056
abbc
10411057
abbbc
1058+
1059+
$ # {n} - exactly n times
1060+
$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{2}c'
1061+
abbc
10421062
```
10431063
10441064
<br>
@@ -1214,7 +1234,6 @@ bar
12141234
```
12151235
12161236
* backslash character classes
1217-
* The **word** `-w` option matches the same set of characters as that of `\w`
12181237
12191238
| Character classes | Description |
12201239
| ------------- | ----------- |
@@ -1247,7 +1266,7 @@ $#
12471266
* One of the uses of grouping is analogous to character classes for whole regular expressions, instead of just list of characters
12481267
* The meta characters `()` are used for grouping
12491268
* requires `\(\)` for BRE
1250-
* Similar to maths `ab + ac = a(b+c)`, think of regular expression `a(b|c) = ab|ac`
1269+
* Similar to `a(b+c)d = abd+acd` in maths, you get `a(b|c)d = abd|acd` in regular expressions
12511270
12521271
```bash
12531272
$ # 5 letter words starting with c and ending with ty or ly
@@ -1315,6 +1334,19 @@ semiprofessionals
13151334
transcendentalist
13161335
```
13171336
1337+
* Spotting repeated words
1338+
1339+
```bash
1340+
$ cat story.txt
1341+
singing tin in the rain
1342+
walking for for a cause
1343+
have a nice day
1344+
day and night
1345+
1346+
$ grep -wE '(\w+)\W+1円' story.txt
1347+
walking for for a cause
1348+
```
1349+
13181350
* **Note** that there is an [issue for certain usage of back-reference and quantifier](https://debbugs.gnu.org/cgi/bugreport.cgi?bug=26864)
13191351
13201352
```bash
@@ -1337,20 +1369,6 @@ Appaloosa
13371369
Appleseed
13381370
```
13391371
1340-
* Useful to spot repeated words
1341-
* Use `-z` option (covered later) to match repetition in consecutive lines
1342-
1343-
```bash
1344-
$ cat story.txt
1345-
singing tin in the rain
1346-
walking for for a cause
1347-
have a nice day
1348-
day and night
1349-
1350-
$ grep -wE '(\w+)\W+1円' story.txt
1351-
walking for for a cause
1352-
```
1353-
13541372
<br>
13551373
13561374
## <a name="multiline-matching"></a>Multiline matching
@@ -1408,6 +1426,7 @@ $ man grep | sed -n '/^\s*-P/,/^$/p'
14081426
```
14091427
14101428
* The man page informs that `-P` is *highly experimental*. So far, haven't faced any issues. But do keep this in mind.
1429+
* newer versions of `GNU grep` has fixes for some `-P` bugs, see [release notes](https://savannah.gnu.org/news/?group_id=67) for an overview of changes between versions
14111430
* Only a few highlights is presented here
14121431
* For more info
14131432
* `man pcrepattern` or [read it online](https://www.pcre.org/original/doc/html/pcrepattern.html)
@@ -1733,10 +1752,10 @@ real 0m0.008s
17331752
* `*` match preceding character/group 0 or more times
17341753
* `+` match preceding character/group 1 or more times
17351754
* `?` match preceding character/group 0 or 1 times
1755+
* `{m,n}` match preceding character/group m to n times, including m and n
1756+
* `{m,}` match preceding character/group m or more times
1757+
* `{,n}` match preceding character/group 0 to n times
17361758
* `{n}` match preceding character/group exactly n times
1737-
* `{n,}` match preceding character/group n or more times
1738-
* `{n,m}` match preceding character/group n to m times, including n and m
1739-
* `{,m}` match preceding character/group up to m times
17401759
17411760
<br>
17421761
@@ -1764,8 +1783,7 @@ real 0m0.008s
17641783
17651784
#### <a name="basic-vs-extended-regular-expressions"></a>Basic vs Extended Regular Expressions
17661785
1767-
By default, the pattern passed to `grep` is treated as Basic Regular Expressions(BRE), which can be overridden using options like `-E` for ERE and `-P` for Perl Compatible Regular Expression(PCRE)
1768-
Paraphrasing from `info grep`
1786+
By default, the pattern passed to `grep` is treated as Basic Regular Expressions(BRE), which can be overridden using options like `-E` for ERE and `-P` for Perl Compatible Regular Expression(PCRE). Paraphrasing from `info grep`
17691787
17701788
>In Basic Regular Expressions the meta-characters `? + { | ( )` lose their special meaning, instead use the backslashed versions `\? \+ \{ \| \( \)`
17711789
@@ -1776,22 +1794,24 @@ Paraphrasing from `info grep`
17761794
* `man grep` and `info grep`
17771795
* At least go through all options ;)
17781796
* **Usage section** in `info grep` has good examples as well
1797+
* This chapter has also been [converted to a book](https://github.com/learnbyexample/learn_gnugrep_ripgrep) with additional examples, exercises and covers popular alternative `ripgrep`
17791798
* A bit of history
1799+
* [Brian Kernighan remembers the origins of grep](https://thenewstack.io/brian-kernighan-remembers-the-origins-of-grep/)
17801800
* [how grep command was born](https://medium.com/@rualthanzauva/grep-was-a-private-command-of-mine-for-quite-a-while-before-i-made-it-public-ken-thompson-a40e24a5ef48)
17811801
* [why GNU grep is fast](https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html)
17821802
* [unix.stackexchange - Difference between grep, egrep and fgrep](https://unix.stackexchange.com/questions/17949/what-is-the-difference-between-grep-egrep-and-fgrep)
1783-
* Tutorials and Q&A
1784-
* [grep tutorial](https://www.panix.com/~elflord/unix/grep.html)
1785-
* [grep examples](https://alvinalexander.com/unix/edu/examples/grep.shtml)
1803+
* Q&A on stackoverflow/stackexchange are good source of learning material, good for practice exercises as well
17861804
* [grep Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/grep?sort=votes&pageSize=15)
17871805
* [grep Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/grep?sort=votes&pageSize=15)
17881806
* Learn Regular Expressions (has information on flavors other than BRE/ERE/PCRE too)
17891807
* [Regular Expressions Tutorial](https://www.regular-expressions.info/tutorial.html)
1808+
* [rexegg](https://www.rexegg.com/) - tutorials, tricks and more
17901809
* [regexcrossword](https://regexcrossword.com/)
17911810
* [stackoverflow - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)
17921811
* [online regex tester and debugger](https://regex101.com/) - by default `pcre` flavor
17931812
* Alternatives
1813+
* [ripgrep](https://github.com/BurntSushi/ripgrep)
17941814
* [pcregrep](https://www.pcre.org/original/doc/html/pcregrep.html)
17951815
* [ag - silver searcher](https://github.com/ggreer/the_silver_searcher)
1796-
* [ripgrep](https://github.com/BurntSushi/ripgrep)
17971816
* [unix.stackexchange - When to use grep, sed, awk, perl, etc](https://unix.stackexchange.com/questions/303044/when-to-use-grep-less-awk-sed)
1817+

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /