Re: Error installing htmlparser
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Error installing htmlparser
 
- From: steve donovan <steve.j.donovan@...>
 
- Date: 2013年11月25日 10:30:29 +0200
 
On Sun, Nov 24, 2013 at 10:41 PM, Craig Barnes <craigbarnes85@gmail.com> wrote:
>> [1] http://stevedonovan.github.io/Penlight/api/modules/pl.xml.html#parsehtml>
> Doesn't work for me. Am I doing something wrong?
Nope, it's an actual bug. It was expecting DOCTYPE in caps, which of
course is not how HTML works. Then it parses the well-formed HTML fine
- but I must emphasize, that this is a 'relaxed' mode of a dinky XML
parser and really cannot cope with any badly-formed HTML. So I can't
recommend it for people who need to deal with the real web.
It coped ok with the Slashdot front page, but that's fairly decent HTML.
The result of the well-formed HTML is the following LOM table:
{
 tag = "html",
 attr = {
 lang = "en"
 },
 {
 tag = "head",
 {
 tag = "meta",
 attr = { charset = "utf-8" },
 empty = 1,
 },
 {
 tag = "title",
 "Test",
 },
 },
 {
 tag = "body",
 {
 tag = "h1",
 "Test",
 }
 }
}
(Cleaned up from pretty.dump)
steve d.