Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 0981b0b

Browse files
first iteration complete
1 parent 9beb2c6 commit 0981b0b

File tree

1 file changed

+78
-0
lines changed

1 file changed

+78
-0
lines changed

‎README.md‎

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,81 @@ match 24 files found?
7171
skip No files found.
7272
Corresponding regex: \d+ files? found\?
7373
```
74+
### whitespaces
75+
Whitespaces include space(\_), tab(\t), newline(\n) and carriage return(\r). Apart from these metacharacters, \s covers all whitespaces.
76+
```text
77+
match 1. abc
78+
match 2. abc
79+
match 3. abc
80+
skip 4.abc
81+
Corresponding regex: \d\.\s+abc
82+
```
83+
84+
### starting and ending
85+
It is best practice to write as specific regular expressions as possible to ensure that false positivesdo not creep in. E.g. search for 'success' in a file also taking into account 'Error: unsuccessful attempt'. To tighten patterns, **(^)hat** and **($)dollar** signs are used to mark the start and end of a line. ***Note***: This hat sign is different from the one used earlier in this tutorial to exclude characters.
86+
```text
87+
match Mission: successful
88+
skip Last Mission: unsuccessful
89+
skip Next Mission: successful upon capture of target
90+
Corresponding regex: ^Mission: successful$
91+
```
92+
93+
### match groups
94+
Regular expressions allow information extraction for further processing. This is done by defining groups of characters and capturing them using the special parentheses **(** and **)** metacharacters. Any subpattern inside a pair of parentheses will be captured as a group. For example, **^(IMG\d+\.png)$** will capture and extract the full image filename, but if extension is not required, the pattern will be **^(IMG\d+)\.png$** which only captures the part before the period.
95+
```text
96+
capture file_record_transcript.pdf -> file_record_transcript
97+
capture file_07241999.pdf -> file_07241999
98+
skip testfile_fake.pdf.tmp
99+
Corresponding regex: ^(file.+)\.pdf$
100+
```
101+
102+
### nested groups
103+
Nested groups can be used to extract multiple layers of information. Using previous example,the filename and the picture number both can be extracted using the same pattern by writing an expression like **^(IMG(\d+))\.png$**. The nested groups are read from left to right in the pattern, with the first capture group being the contents of the first parentheses group, etc.
104+
```text
105+
capture Jan 1987 -> Jan 1987 1987
106+
capture May 1969 -> May 1969 1969
107+
capture Aug 2011 -> Aug 2011 2011
108+
Corresponding regex: (\w+\s(\d+))
109+
```
110+
111+
### conditionals
112+
The **| (logical OR, aka. the pipe)** is used to denote different possible sets of characters. Example, "Buy more (milk|bread|juice)" will match only the strings _Buy more milk_, _Buy more bread_, or _Buy more juice_.
113+
```text
114+
match I love cats
115+
match I love dogs
116+
skip I love logs
117+
skip I love cogs
118+
Corresponding regex: I love (cats|dogs)
119+
```
120+
121+
### back referencing and other special characters
122+
Back referencing varies depending on the implementation. However, many systems allow to reference captured groups by using **0円** (usually the full matched text), **1円** (group 1), **2円** (group 2), etc. For example, **"2円-1円"** to put the second captured data first, and the first captured data second.
123+
Additionally, there is a special metacharacter \b which matches the boundary between a word and a non-word character. It's most useful in capturing entire words (for example by using the pattern \w+\b).
124+
125+
## Recaptulation
126+
<table>
127+
<tr> <td>abc...</td><td>Letters</td> </tr>
128+
<tr> <td>123...</td><td>Digits</td> </tr>
129+
<tr> <td>\d</td><td>Any Digit</td> </tr>
130+
<tr> <td>\D</td><td>Any Non-digit character</td> </tr>
131+
<tr> <td>.</td><td>Any Character</td> </tr>
132+
<tr> <td>\.</td><td>Period</td> </tr>
133+
<tr> <td>[abc]</td><td>Only a, b, or c</td> </tr>
134+
<tr> <td>[^abc]</td><td>Not a, b, nor c</td> </tr>
135+
<tr> <td>[a-z]</td><td>Characters a to z</td> </tr>
136+
<tr> <td>[0-9]</td><td>Numbers 0 to 9</td> </tr>
137+
<tr> <td>\w</td><td>Any Alphanumeric character</td> </tr>
138+
<tr> <td>\W</td><td>Any Non-alphanumeric character</td> </tr>
139+
<tr> <td>{m}</td><td>m Repetitions</td> </tr>
140+
<tr> <td>{m,n}</td><td>m to n Repetitions</td> </tr>
141+
<tr> <td>*</td><td>Zero or more repetitions</td> </tr>
142+
<tr> <td>+</td><td>One or more repetitions</td> </tr>
143+
<tr> <td>?</td><td>Optional character</td> </tr>
144+
<tr> <td>\s</td><td>Any Whitespace</td> </tr>
145+
<tr> <td>\S</td><td>Any Non-whitespace character</td> </tr>
146+
<tr> <td>^...$</td><td>Starts and ends</td> </tr>
147+
<tr> <td>(...)</td><td>Capture Group</td> </tr>
148+
<tr> <td>(a(bc))</td><td>Capture Sub-group</td> </tr>
149+
<tr> <td>(.*)</td><td>Capture all</td> </tr>
150+
<tr> <td>(abc|def)</td><td>Matches abc or def</td> </tr>
151+
</table>

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /