This repository was archived by the owner on Jun 5, 2024. It is now read-only.

Commit 6aeebc7

authored

added more clarifications and links

1 parent 3151e05 commit 6aeebc7Copy full SHA for 6aeebc7

File tree

1 file changed

+59

-39

lines changed

gnu_grep.md

1 file changed

+59

-39

lines changed

`‎gnu_grep.md`

Lines changed: 59 additions & 39 deletions

Original file line number	Diff line number	Diff line change
`@@ -361,7 +361,7 @@ Sugar is sweet,`
`361`	`361`	```
`362`	`362`
`363`	`363`	* If there are multiple non-adjacent matching segments, by default `grep` adds a line `--` to separate them
`364`		`- * non-adjacent here implies that segments are separated by at least one line`
	`364`	`+ * non-adjacent here implies that segments are separated by at least one line in input data`
`365`	`365`
`366`	`366`	```bash
`367`	`367`	`$ seq 29 \| grep -A1 '3'`
`@@ -792,7 +792,7 @@ $ printf 'foo\cbar' \| grep -o '\\c'`
`792`	`792`
`793`	`793`	* The `-w` option works well to match whole words. But what about matching only start or end of words?
`794`	`794`	* Anchors `\<` and `\>` will match start/end positions of a word
`795`		-* `\b` can also be used instead of `\<` and `\>` which matches either edge of a word
	`795`	+* `\b` can also be used instead of `\<` and `\>` which matches both edges of a word
`796`	`796`
`797`	`797`	```bash
`798`	`798`	`$ printf 'spar\npar\npart\napparent\n'`
`@@ -843,6 +843,24 @@ part`
`843`	`843`	`apparent`
`844`	`844`	```
`845`	`845`
	`846`	+* the word boundary escape sequences differ slightly from `-w` option
	`847`	`+`
	`848`	+```bash
	`849`	`+$ # this fails because there is no word boundary between space and +`
	`850`	`+$ echo '2 +3 = 5' \| grep '\b+3\b'`
	`851`	`+$ # this works as -w only ensures that there are no surrounding word characters`
	`852`	`+$ echo '2 +3 = 5' \| grep -w '+3'`
	`853`	`+2 +3 = 5`
	`854`	`+`
	`855`	`+$ # doesn't work as , isn't at start of word boundary`
	`856`	`+$ echo 'hi, 2 one' \| grep '\<, 2\>'`
	`857`	`+$ # won't match as there are word characters before ,`
	`858`	`+$ echo 'hi, 2 one' \| grep -w ', 2'`
	`859`	`+$ # works as \b matches both edges and , is at end of word after i`
	`860`	`+$ echo 'hi, 2 one' \| grep '\b, 2\b'`
	`861`	`+hi, 2 one`
	`862`	+```
	`863`	`+`
`846`	`864`	`<br>`
`847`	`865`
`848`	`866`	`#### <a name="alternation"></a>Alternation`
`@@ -933,7 +951,7 @@ $ echo '1 & 2' \| grep -o '.'`
`933`	`951`
`934`	`952`	`<br>`
`935`	`953`
`936`		`-#### <a name="quantifiers"></a>Quantifiers`
	`954`	`+#### <a name="quantifiers"></a>Greedy Quantifiers`
`937`	`955`
`938`	`956`	`Defines how many times a character (simplified for now) should be matched`
`939`	`957`
`@@ -963,6 +981,8 @@ act`
`963`	`981`
`964`	`982`	* `*` will try to match 0 or more times
`965`	`983`	* There is no upper limit and `*` will try to match as many times as possible
	`984`	`+ * if matching maximum times results in overall regex failing, then next best count is chosen until overall regex passes`
	`985`	`+ * if there are multiple quantifiers, left-most quantifier gets precedence`
`966`	`986`
`967`	`987`	```bash
`968`	`988`	`$ echo 'abbbc' \| grep -o 'b*'`
`@@ -990,7 +1010,7 @@ $ # matching overall expression gets preference`
`990`	`1010`	`$ echo 'car bat cod map scat dot abacus' \| grep -o 'c.*at'`
`991`	`1011`	`car bat cod map scat`
`992`	`1012`
`993`		`-$ # precendence is left to right in case of multiple matches`
	`1013`	`+$ # precedence is left to right in case of multiple matches`
`994`	`1014`	`$ echo 'car bat cod map scat dot abacus' \| grep -o 'b.*m'`
`995`	`1015`	`bat cod m`
`996`	`1016`	`$ echo 'car bat cod map scat dot abacus' \| grep -o 'b.m'`
`@@ -1015,30 +1035,30 @@ ac`
`1015`	`1035`	`abbc`
`1016`	`1036`	```
`1017`	`1037`
`1018`		-* For more precise control on number of times to match, `{}` (`\{\}`for BRE) is useful
`1019`		-* It can take one of four forms, `{n}`, `{n,m}`, `{,m}` and `{n,}`
`1020`		`-`
	`1038`	+* For more precise control on number of times to match, `{}` is useful
	`1039`	+* use `\{\}`for BRE
	`1040`	+* It can take one of four forms, `{m,n}`, `{,n}`, `{m,}` and `{n}`
`1021`	`1041`
`1022`	`1042`	```bash
`1023`		`-$ # {n} - exactly n times`
`1024`		`-$ echo 'ac abc abbc abbbc' \| grep -Eo 'ab{2}c'`
`1025`		`-abbc`
`1026`		`-`
`1027`		`-$ # {n,m} - n to m, including both n and m`
	`1043`	`+$ # {m,n} - m to n, including both m and n`
`1028`	`1044`	`$ echo 'ac abc abbc abbbc' \| grep -Eo 'ab{1,2}c'`
`1029`	`1045`	`abc`
`1030`	`1046`	`abbc`
`1031`	`1047`
`1032`		`-$ # {,m} - 0 to m times`
	`1048`	`+$ # {,n} - 0 to n times`
`1033`	`1049`	`$ echo 'ac abc abbc abbbc' \| grep -Eo 'ab{,2}c'`
`1034`	`1050`	`ac`
`1035`	`1051`	`abc`
`1036`	`1052`	`abbc`
`1037`	`1053`
`1038`		`-$ # {n,} - at least n times`
	`1054`	`+$ # {m,} - at least m times`
`1039`	`1055`	`$ echo 'ac abc abbc abbbc' \| grep -Eo 'ab{2,}c'`
`1040`	`1056`	`abbc`
`1041`	`1057`	`abbbc`
	`1058`	`+`
	`1059`	`+$ # {n} - exactly n times`
	`1060`	`+$ echo 'ac abc abbc abbbc' \| grep -Eo 'ab{2}c'`
	`1061`	`+abbc`
`1042`	`1062`	```
`1043`	`1063`
`1044`	`1064`	`<br>`
`@@ -1214,7 +1234,6 @@ bar`
`1214`	`1234`	```
`1215`	`1235`
`1216`	`1236`	`* backslash character classes`
`1217`		-* The word `-w` option matches the same set of characters as that of `\w`
`1218`	`1237`
`1219`	`1238`	`\| Character classes \| Description \|`
`1220`	`1239`	`\| ------------- \| ----------- \|`
`@@ -1247,7 +1266,7 @@ $#`
`1247`	`1266`	`* One of the uses of grouping is analogous to character classes for whole regular expressions, instead of just list of characters`
`1248`	`1267`	* The meta characters `()` are used for grouping
`1249`	`1268`	* requires `` for BRE
`1250`		-* Similar to maths `ab + ac = a(b+c)`, think of regular expression `a(b\|c) = ab\|ac`
	`1269`	+* Similar to `a(b+c)d = abd+acd` in maths, you get `a(b\|c)d = abd\|acd` in regular expressions
`1251`	`1270`
`1252`	`1271`	```bash
`1253`	`1272`	`$ # 5 letter words starting with c and ending with ty or ly`
`@@ -1315,6 +1334,19 @@ semiprofessionals`
`1315`	`1334`	`transcendentalist`
`1316`	`1335`	```
`1317`	`1336`
	`1337`	`+* Spotting repeated words`
	`1338`	`+`
	`1339`	+```bash
	`1340`	`+$ cat story.txt`
	`1341`	`+singing tin in the rain`
	`1342`	`+walking for for a cause`
	`1343`	`+have a nice day`
	`1344`	`+day and night`
	`1345`	`+`
	`1346`	`+$ grep -wE '(\w+)\W+1円' story.txt`
	`1347`	`+walking for for a cause`
	`1348`	+```
	`1349`	`+`
`1318`	`1350`	`* Note that there is an [issue for certain usage of back-reference and quantifier](https://debbugs.gnu.org/cgi/bugreport.cgi?bug=26864)`
`1319`	`1351`
`1320`	`1352`	```bash
`@@ -1337,20 +1369,6 @@ Appaloosa`
`1337`	`1369`	`Appleseed`
`1338`	`1370`	```
`1339`	`1371`
`1340`		`-* Useful to spot repeated words`
`1341`		-* Use `-z` option (covered later) to match repetition in consecutive lines
`1342`		`-`
`1343`		-```bash
`1344`		`-$ cat story.txt`
`1345`		`-singing tin in the rain`
`1346`		`-walking for for a cause`
`1347`		`-have a nice day`
`1348`		`-day and night`
`1349`		`-`
`1350`		`-$ grep -wE '(\w+)\W+1円' story.txt`
`1351`		`-walking for for a cause`
`1352`		-```
`1353`		`-`
`1354`	`1372`	`<br>`
`1355`	`1373`
`1356`	`1374`	`## <a name="multiline-matching"></a>Multiline matching`
`@@ -1408,6 +1426,7 @@ $ man grep \| sed -n '/^\s*-P/,/^$/p'`
`1408`	`1426`	```
`1409`	`1427`
`1410`	`1428`	* The man page informs that `-P` is highly experimental. So far, haven't faced any issues. But do keep this in mind.
	`1429`	+ * newer versions of `GNU grep` has fixes for some `-P` bugs, see [release notes](https://savannah.gnu.org/news/?group_id=67) for an overview of changes between versions
`1411`	`1430`	`* Only a few highlights is presented here`
`1412`	`1431`	`* For more info`
`1413`	`1432`	* `man pcrepattern` or [read it online](https://www.pcre.org/original/doc/html/pcrepattern.html)
`@@ -1733,10 +1752,10 @@ real 0m0.008s`
`1733`	`1752`	* `*` match preceding character/group 0 or more times
`1734`	`1753`	* `+` match preceding character/group 1 or more times
`1735`	`1754`	* `?` match preceding character/group 0 or 1 times
	`1755`	+* `{m,n}` match preceding character/group m to n times, including m and n
	`1756`	+* `{m,}` match preceding character/group m or more times
	`1757`	+* `{,n}` match preceding character/group 0 to n times
`1736`	`1758`	* `{n}` match preceding character/group exactly n times
`1737`		-* `{n,}` match preceding character/group n or more times
`1738`		-* `{n,m}` match preceding character/group n to m times, including n and m
`1739`		-* `{,m}` match preceding character/group up to m times
`1740`	`1759`
`1741`	`1760`	`<br>`
`1742`	`1761`
`@@ -1764,8 +1783,7 @@ real 0m0.008s`
`1764`	`1783`
`1765`	`1784`	`#### <a name="basic-vs-extended-regular-expressions"></a>Basic vs Extended Regular Expressions`
`1766`	`1785`
`1767`		-By default, the pattern passed to `grep` is treated as Basic Regular Expressions(BRE), which can be overridden using options like `-E` for ERE and `-P` for Perl Compatible Regular Expression(PCRE)
`1768`		-Paraphrasing from `info grep`
	`1786`	+By default, the pattern passed to `grep` is treated as Basic Regular Expressions(BRE), which can be overridden using options like `-E` for ERE and `-P` for Perl Compatible Regular Expression(PCRE). Paraphrasing from `info grep`
`1769`	`1787`
`1770`	`1788`	>In Basic Regular Expressions the meta-characters `? + { \| ( )` lose their special meaning, instead use the backslashed versions `\? \+ \{ \\| `
`1771`	`1789`
@@ -1776,22 +1794,24 @@ Paraphrasing from `info grep`
`1776`	`1794`	* `man grep` and `info grep`
`1777`	`1795`	`* At least go through all options ;)`
`1778`	`1796`	* Usage section in `info grep` has good examples as well
	`1797`	+* This chapter has also been [converted to a book](https://github.com/learnbyexample/learn_gnugrep_ripgrep) with additional examples, exercises and covers popular alternative `ripgrep`
`1779`	`1798`	`* A bit of history`
	`1799`	`+ * [Brian Kernighan remembers the origins of grep](https://thenewstack.io/brian-kernighan-remembers-the-origins-of-grep/)`
`1780`	`1800`	`* [how grep command was born](https://medium.com/@rualthanzauva/grep-was-a-private-command-of-mine-for-quite-a-while-before-i-made-it-public-ken-thompson-a40e24a5ef48)`
`1781`	`1801`	`* [why GNU grep is fast](https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html)`
`1782`	`1802`	`* [unix.stackexchange - Difference between grep, egrep and fgrep](https://unix.stackexchange.com/questions/17949/what-is-the-difference-between-grep-egrep-and-fgrep)`
`1783`		`-* Tutorials and Q&A`
`1784`		`- * [grep tutorial](https://www.panix.com/~elflord/unix/grep.html)`
`1785`		`- * [grep examples](https://alvinalexander.com/unix/edu/examples/grep.shtml)`
	`1803`	`+* Q&A on stackoverflow/stackexchange are good source of learning material, good for practice exercises as well`
`1786`	`1804`	`* [grep Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/grep?sort=votes&pageSize=15)`
`1787`	`1805`	`* [grep Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/grep?sort=votes&pageSize=15)`
`1788`	`1806`	`* Learn Regular Expressions (has information on flavors other than BRE/ERE/PCRE too)`
`1789`	`1807`	`* [Regular Expressions Tutorial](https://www.regular-expressions.info/tutorial.html)`
	`1808`	`+ * [rexegg](https://www.rexegg.com/) - tutorials, tricks and more`
`1790`	`1809`	`* [regexcrossword](https://regexcrossword.com/)`
`1791`	`1810`	`* [stackoverflow - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)`
`1792`	`1811`	* [online regex tester and debugger](https://regex101.com/) - by default `pcre` flavor
`1793`	`1812`	`* Alternatives`
	`1813`	`+ * [ripgrep](https://github.com/BurntSushi/ripgrep)`
`1794`	`1814`	`* [pcregrep](https://www.pcre.org/original/doc/html/pcregrep.html)`
`1795`	`1815`	`* [ag - silver searcher](https://github.com/ggreer/the_silver_searcher)`
`1796`		`- * [ripgrep](https://github.com/BurntSushi/ripgrep)`
`1797`	`1816`	`* [unix.stackexchange - When to use grep, sed, awk, perl, etc](https://unix.stackexchange.com/questions/303044/when-to-use-grep-less-awk-sed)`
	`1817`	`+`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 6aeebc7

File tree

1 file changed

1 file changed

`‎gnu_grep.md`

0 commit comments