Re: [Python-Dev] why we have both re.match and re.string?

2016年2月10日 15:15:44 -0800

On Wed, Feb 10, 2016 at 10:59:18PM +0100, Luca Sangiacomo wrote:
> Hi,
> I hope the question is not too silly, but why I would like to understand 
> the advantages of having both re.match() and re.search(). Wouldn't be 
> more clear to have just one function with one additional parameters like 
> this:
> 
> re.search(regexp, text, from_beginning=True|False) ?
I guess the most important reason now is backwards compatibility. The 
oldest Python I have installed here is version 1.5, and it has the brand 
new "re" module (intended as a replacement for the old "regex" module). 
Both have search() and match() top-level functions. So my guess is that 
you would have to track down the author of the original "regex" module.
But a more general answer is the principle, "Functions shouldn't take 
constant bool arguments". It is an API design principle which (if I 
remember correctly) Guido has stated a number of times. Functions should 
not take a boolean argument which (1) exists only to select between two 
different modes and (2) are nearly always given as a constant.
Do you ever find yourself writing code like this?
if some_calculation():
 result = re.match(regex, string)
else:
 result = re.search(regex, string)
If you do, that would be a hint that perhaps match() and search() should 
be combined so you can write:
result = re.search(regex, string, some_calculation())
But I expect that you almost never do. I would expect that if we 
combined the two functions into one, we would nearly always call them 
with a constant bool:
# I always forget whether True means match from the start or not, 
# and which is the default...
result = re.search(regex, string, False)
which suggests that search() is actually two different functions, and 
should be split into two, just as we have now.
It's a general principle, not a law of nature, so you may find 
exceptions in the standard library. But if I were designing the re 
module from scratch, I would either keep the two distinct functions, or 
just provide search() and let users use ^ to anchor the search to the 
beginning.
> In this way we prevent, as written in the documentation, people writing 
> ".*" in front of the regexp used with re.match()
I only see one example that does that:
https://docs.python.org/3/library/re.html#checking-for-a-pair
Perhaps it should be changed.
-- 
Steve
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to