lua-users home
lua-l archive

Re: Error installing htmlparser

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Nov 24, 2013 at 10:41 PM, Craig Barnes <craigbarnes85@gmail.com> wrote:
>> [1] http://stevedonovan.github.io/Penlight/api/modules/pl.xml.html#parsehtml>
> Doesn't work for me. Am I doing something wrong?
Nope, it's an actual bug. It was expecting DOCTYPE in caps, which of
course is not how HTML works. Then it parses the well-formed HTML fine
- but I must emphasize, that this is a 'relaxed' mode of a dinky XML
parser and really cannot cope with any badly-formed HTML. So I can't
recommend it for people who need to deal with the real web.
It coped ok with the Slashdot front page, but that's fairly decent HTML.
The result of the well-formed HTML is the following LOM table:
{
 tag = "html",
 attr = {
 lang = "en"
 },
 {
 tag = "head",
 {
 tag = "meta",
 attr = { charset = "utf-8" },
 empty = 1,
 },
 {
 tag = "title",
 "Test",
 },
 },
 {
 tag = "body",
 {
 tag = "h1",
 "Test",
 }
 }
}
(Cleaned up from pretty.dump)
steve d.

AltStyle によって変換されたページ (->オリジナル) /