Re: [Python-Dev] Changes in html.parser may cause breakage in client code

2012年4月26日 12:29:07 -0700

On 26.04.2012 21:10, Vinay Sajip wrote:
> Following recent changes in html.parser, the Python 3 port of Django I'm 
> working
> on has started failing while parsing HTML.
> 
> The reason appears to be that Django uses some module-level data in 
> html.parser,
> for example tagfind, which is a regular expression pattern. This has changed
> recently (Ezio changed it in ba4baaddac8d).
> 
> Now tagfind (and other such patterns) are not marked as private (though not
> documented), but should they be? The following script (tagfind.py):
> 
> import html.parser as Parser
> 
> data = '<select name="stuff">'
> 
> m = Parser.tagfind.match(data, 1)
> print('%r -> %r' % (Parser.tagfind.pattern, data[1:m.end()]))
> 
> gives different results on 3.2 and 3.3:
> 
> $ python3.2 tagfind.py
> '[a-zA-Z][-.a-zA-Z0-9:_]*' -> 'select'
> $ python3.3 tagfind.py
> '([a-zA-Z][-.a-zA-Z0-9:_]*)(?:\\s|/(?!>))*' -> 'select '
> 
> The trailing space later causes a mismatch with the end tag, and leads to the
> errors. Django's use of the tagfind pattern is in a subclass of HTMLParser, in
> an overridden parse_startag method.
> 
> Do we need to indicate more strongly that data like tagfind are private? Or 
> has
> the change introduced inadvertent breakage, requiring a fix in Python?
Since it's a module level constant without a leading underscore, IMO it was
okay for Django to use it, even if not documented.
In this case, especially since we actually have evidence of someone using the
constant, I would keep it as-is and use a new (underscored, this time) name for
the new pattern.
And yes, I think that we do need to indicate private-ness of module-level data.
Georg
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to