[Python-Dev] Fixing the XML batteries
Stefan Behnel
stefan_ml at behnel.de
Sat Dec 10 08:38:35 CET 2011
Bill Janssen, 09.12.2011 19:15:
> I think another thing that might go into "refreshing the batteries" is a
> feature comparison of BeautifulSoup and HTML5lib against the stdlib
> competition, to see what needs to be added/revised. Having to switch to
> an outside package for parsing possibly invalid HTML is a pain.
Such a feature request should be worth a separate thread.
Note, however, that html5lib is likely way too big to add it to the stdlib,
and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3,
which would be the target release series for better HTML support. So,
whatever library or API you would want to use for HTML processing is
currently only the second question as long as Py3 lacks a real-world HTML
parser in the stdlib, as well as a robust character detection mechanism. I
don't think that can be fixed all that easily.
Stefan
More information about the Python-Dev
mailing list