[Python-Dev] sgmllib Comments

Terry Reedy tjreedy at udel.edu
Mon Jun 12 04:06:16 CEST 2006


"Fred L. Drake, Jr." <fdrake at acm.org> wrote in message 
news:200606112039.37834.fdrake at acm.org...
> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
> > Planet is a feed aggregator written in Python. It depends heavily on
> > SGMLLib. A recent bug report turned out to be a deficiency in sgmllib,
> > and I've submitted a test case and a patch[1] (use or discard the 
> > patch,
> > it is the test that I care about).
...
> > and which are original. (Note: feeds often contain such abominations 
> > as
> > &amp;copy; which the new code will treat indistinguishably from &copy;)

> It really sounds like sgmllib is the wrong foundation for this.
...
> Have you looked at HTMLParser as an alternate to sgmllib?
> It has better support for XHTML constructs.

Have you (the OP), checked how related Python projects, such as Mark 
Pilgrim's feed parser,
http://www.feedparser.org/
handle the same sort of input (I have only looked at docs and tests, not 
code).
tjr


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /