Commit 0981b0b

authored

first iteration complete

1 parent 9beb2c6 commit 0981b0bCopy full SHA for 0981b0b

File tree

1 file changed

+78

-0

lines changed

README.md

1 file changed

+78

-0

lines changed

`‎README.md‎`

Lines changed: 78 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -71,3 +71,81 @@ match 24 files found?`
`71`	`71`	`skip No files found.`
`72`	`72`	`Corresponding regex: \d+ files? found\?`
`73`	`73`	```
	`74`	`+### whitespaces`
	`75`	`+Whitespaces include space(\_), tab(\t), newline(\n) and carriage return(\r). Apart from these metacharacters, \s covers all whitespaces.`
	`76`	+```text
	`77`	`+match 1. abc`
	`78`	`+match 2. abc`
	`79`	`+match 3. abc`
	`80`	`+skip 4.abc`
	`81`	`+Corresponding regex: \d\.\s+abc`
	`82`	+```
	`83`	`+`
	`84`	`+### starting and ending`
	`85`	`+It is best practice to write as specific regular expressions as possible to ensure that false positivesdo not creep in. E.g. search for 'success' in a file also taking into account 'Error: unsuccessful attempt'. To tighten patterns, (^)hat and ($)dollar signs are used to mark the start and end of a line. *Note*: This hat sign is different from the one used earlier in this tutorial to exclude characters.`
	`86`	+```text
	`87`	`+match Mission: successful`
	`88`	`+skip Last Mission: unsuccessful`
	`89`	`+skip Next Mission: successful upon capture of target`
	`90`	`+Corresponding regex: ^Mission: successful$`
	`91`	+```
	`92`	`+`
	`93`	`+### match groups`
	`94`	`+Regular expressions allow information extraction for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group. For example, ^(IMG\d+\.png)$ will capture and extract the full image filename, but if extension is not required, the pattern will be ^(IMG\d+)\.png$ which only captures the part before the period.`
	`95`	+```text
	`96`	`+capture file_record_transcript.pdf -> file_record_transcript`
	`97`	`+capture file_07241999.pdf -> file_07241999`
	`98`	`+skip testfile_fake.pdf.tmp`
	`99`	`+Corresponding regex: ^(file.+)\.pdf$`
	`100`	+```
	`101`	`+`
	`102`	`+### nested groups`
	`103`	`+Nested groups can be used to extract multiple layers of information. Using previous example,the filename and the picture number both can be extracted using the same pattern by writing an expression like ^(IMG(\d+))\.png$. The nested groups are read from left to right in the pattern, with the first capture group being the contents of the first parentheses group, etc.`
	`104`	+```text
	`105`	`+capture Jan 1987 -> Jan 1987 1987`
	`106`	`+capture May 1969 -> May 1969 1969`
	`107`	`+capture Aug 2011 -> Aug 2011 2011`
	`108`	`+Corresponding regex: (\w+\s(\d+))`
	`109`	+```
	`110`	`+`
	`111`	`+### conditionals`
	`112`	`+The \| (logical OR, aka. the pipe) is used to denote different possible sets of characters. Example, "Buy more (milk\|bread\|juice)" will match only the strings _Buy more milk_, _Buy more bread_, or _Buy more juice_.`
	`113`	+```text
	`114`	`+match I love cats`
	`115`	`+match I love dogs`
	`116`	`+skip I love logs`
	`117`	`+skip I love cogs`
	`118`	`+Corresponding regex: I love (cats\|dogs)`
	`119`	+```
	`120`	`+`
	`121`	`+### back referencing and other special characters`
	`122`	`+Back referencing varies depending on the implementation. However, many systems allow to reference captured groups by using 0円 (usually the full matched text), 1円 (group 1), 2円 (group 2), etc. For example, "2円-1円" to put the second captured data first, and the first captured data second.`
	`123`	`+Additionally, there is a special metacharacter \b which matches the boundary between a word and a non-word character. It's most useful in capturing entire words (for example by using the pattern \w+\b).`
	`124`	`+`
	`125`	`+## Recaptulation`
	`126`	`+<table>`
	`127`	`+ <tr> <td>abc...</td><td>Letters</td> </tr>`
	`128`	`+ <tr> <td>123...</td><td>Digits</td> </tr>`
	`129`	`+ <tr> <td>\d</td><td>Any Digit</td> </tr>`
	`130`	`+ <tr> <td>\D</td><td>Any Non-digit character</td> </tr>`
	`131`	`+ <tr> <td>.</td><td>Any Character</td> </tr>`
	`132`	`+ <tr> <td>\.</td><td>Period</td> </tr>`
	`133`	`+ <tr> <td>[abc]</td><td>Only a, b, or c</td> </tr>`
	`134`	`+ <tr> <td>[^abc]</td><td>Not a, b, nor c</td> </tr>`
	`135`	`+ <tr> <td>[a-z]</td><td>Characters a to z</td> </tr>`
	`136`	`+ <tr> <td>[0-9]</td><td>Numbers 0 to 9</td> </tr>`
	`137`	`+ <tr> <td>\w</td><td>Any Alphanumeric character</td> </tr>`
	`138`	`+ <tr> <td>\W</td><td>Any Non-alphanumeric character</td> </tr>`
	`139`	`+ <tr> <td>{m}</td><td>m Repetitions</td> </tr>`
	`140`	`+ <tr> <td>{m,n}</td><td>m to n Repetitions</td> </tr>`
	`141`	`+ <tr> <td>*</td><td>Zero or more repetitions</td> </tr>`
	`142`	`+ <tr> <td>+</td><td>One or more repetitions</td> </tr>`
	`143`	`+ <tr> <td>?</td><td>Optional character</td> </tr>`
	`144`	`+ <tr> <td>\s</td><td>Any Whitespace</td> </tr>`
	`145`	`+ <tr> <td>\S</td><td>Any Non-whitespace character</td> </tr>`
	`146`	`+ <tr> <td>^...$</td><td>Starts and ends</td> </tr>`
	`147`	`+ <tr> <td>(...)</td><td>Capture Group</td> </tr>`
	`148`	`+ <tr> <td>(a(bc))</td><td>Capture Sub-group</td> </tr>`
	`149`	`+ <tr> <td>(.*)</td><td>Capture all</td> </tr>`
	`150`	`+ <tr> <td>(abc\|def)</td><td>Matches abc or def</td> </tr>`
	`151`	`+</table>`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 0981b0b

File tree

1 file changed

1 file changed

`‎README.md‎`

0 commit comments