[Python-ideas] Adding function checks to regex

MRAB python at mrabarnett.plus.com
Sat Mar 19 17:19:30 CET 2011


On 19/03/2011 11:33, Peter Otten wrote:
> MRAB wrote:
>>> Some of those who are relative new to regexes sometimes ask how to write
>> a regex which checks that a number is in a range or is a valid date.
>> Although this may be possible, it certainly isn't easy.
>>>> From what I've read, Perl has a way of including code in a regex, but I
>> don't think that's a good idea
>>>> However, it occurs to me that there may be a case for being able to call
>> a supplied function to perform such checking.
>>>> Borrowing some syntax from Perl, it could look like this:
>>>> def range_check(m):
>> return 1<= int(m.group())<= 10
>>>> numbers = regex.findall(r"\b\d+\b(*CALL)", text, call=range_check)
>>>> The regex module would match as normal until the "(*CALL)", at which
>> point it would call the function. If the function returns True, the
>> matching continues (and succeeds); if the function returns False, the
>> matching backtracks (and fails).
>> I would approach that with
>> numbers = (int(m.group()) for m in re.finditer(r"\b\d+\b"))
> numbers = [n for n in numbers if 1<= n<= 10]
>> here. This is of similar complexity, but has the advantage that you can use
> the building blocks throughout your python scripts. Could you give an
> example where the benefits of the proposed syntax stand out more?
>There may be a use case in config files where you define rules (for
example, Apache <FilesMatch>) or web forms where you have validation,
but a regex is too limited. This would enable you to add 'richer'
checking. There could be a predefined set of checks, such as whether a
date is valid.
>> The function would be passed a match object.
>>>> An extension, again borrowing the syntax from Perl, could include a tag
>> like this:
>>>> numbers = regex.findall(r"\b\d+\b(*CALL:RANGE)", text,
>> call=range_check)
>>>> The tag would be passed to the function so that it could support
>> multiple checks.
>> [brainstorm mode]
> Could the same be achieved without new regex syntax? I'm thinking of reusing
> named groups:
>> re.findall(r"\b(?P<number>\d+)\b", text,
> number=lambda s: 1<= int(s)<= 10)
>I'm not sure about that.


More information about the Python-ideas mailing list

AltStyle によって変換されたページ (->オリジナル) /