Python - Using a Regular Expression

There are several methods which are commonly used with regular expressions. The most common first step is to compile the RE definition string to make an Pattern object. The resulting Pattern object can then be used to match or search candidate strings. A successful match returns a Match object with details of the matching substring.

The re module provides the compile function.

re.compile ( expr ) → Pattern: Create a Pattern object from an RE string. The Pattern is used for all subsequent searching or matching operations. A Pattern has several methods, including match and search.

Generally, raw string notation (r"pattern") is used to write a RE. This simplifies the \'s required. Without the raw notation, each \ in the string would have to be escaped by a \, making it \\. This rapidly gets cumbersome. There are some other options available for re.compile, see the Python Library Reference, section 4.2, for more information.

The following methods are part of a compiled Pattern. We'll use the name pat to refer to some Pattern object created by the re.compile function.

pat. match( string ) → Match: Match the candidate string against the compiled regular expression, pat. Matching means that the regular expression and the candidate string must match, starting at the beginning of the candidate string. A Match object is returned if there is match, otherwise None is returned.
pat. search( string ) → Match: Search a candidate string for the compiled regular expression, pat. Searching means that the regular expression must be found somewhere in the candidate string. A Match object is returned if the pattern is found, otherwise None is returned.

If search or match finds the pattern in the candidate string, a Match object is created to describe the part of the candidate string which matched. The following methods are part of a Match object. We'll use the name match to refer to some Match object created by a successul search or match operation.

match. group( number ) → string: Retrieve the string that matched a particular () grouping in the regular expression. Group zero is a tuple of everything that matched. Group 1 is the material that matched the first set of ()'s.

Here's a more complete example.

>>> 
import re

>>> 
rawin= "20:07:13.2"

>>> 
hms_pat= re.compile( r'(\d+):(\d+):(\d+\.?\d*)' )

>>> 
hms_match= hms_pat.match( rawin )

>>> 
print hms_match.group( 0, 1, 2, 3 )

('20:07:13.2', '20', '07', '13.2')
>>> 
h,m,s= map( float, hms_match.group(1,2,3) )

>>> 
seconds= ((h*60)+m)*60+s

>>> 
print h, m, s, "=", seconds

20.0 7.0 13.2 = 72433.2

This sequence decodes a complex input value into fields and then computes a single result. The import statement incorporates the re module. The rawin variable is sample input, perhaps from a file, perhaps from raw_input. The hms_pat variable is the compiled regular expression object which matches three numbers, using "(\d+)", separated by :'s.

The digit-sequence RE's are surround by ()'s so that the material that matched is returned as a group. This will lead to four groups: group 0 is everything that matched, groups 1, 2, and 3 are successive digit strings. The hms_match variable is a Match object that indicates success or failure in matching. If hms_match is None, no match occurred. Otherwise, the hms_match.group method will reveal the individually matched input items.

The statement that sets h, m, and s does three things. First is uses hms_match.group to create a tuple of requested items. Each item in the tuple will be a string, so the map function is used to apply the built-in float function against each string to create a tuple of three numbers. Finally, this statement relies on the multiple-assignment feature to set all three variables at once. Finally, seconds is computed as the number of seconds past midnight for the given time stamp.

Prev	Up	Next
Creating a Regular Expression	Home	Regular Expression Exercises

Published under the terms of the Open Publication License Design by Interspire