Skip to main content
Stack Overflow
  1. About
  2. For Teams

Timeline for answer to How do you parse and process HTML/XML in PHP? by mario

Current License: CC BY-SA 3.0

Post Revisions

13 events
when toggle format what by license comment
May 7, 2016 at 10:43 history edited mario CC BY-SA 3.0
added 168 characters in body
Jul 2, 2014 at 20:48 comment added hek2mgl Most XML parsers cannot see HTML document comments I'm not sure which parser you are using, but my parser can "read" comments. -1
Dec 26, 2013 at 18:35 history edited Amal CC BY-SA 3.0
Dead link.
Dec 3, 2011 at 20:20 history post merged (destination)
Jan 22, 2011 at 13:10 vote accept Pekka
Jan 22, 2011 at 13:10
Nov 21, 2010 at 1:38 comment added tchrist @mario: Actually, HTML can be ‘properly’ parsed using regexes, although usually it takes several of them to do a fair job a tit. It’s just a royal pain in the general case. In specific cases with well-defined input, it verges on trivial. Those are the cases that people should be using regexes on. Big old hungry heavy parsers are really what you need for general cases, though it isn’t always clear to the casual user where to draw that line. Whichever code is simpler and easier, wins.
Sep 7, 2010 at 14:56 history edited mario CC BY-SA 2.5
added 83 characters in body
Sep 7, 2010 at 12:11 comment added ircmaxell Well, just a comment about your "real-world consideration" standpoint. Sure, there ARE useful situations for Regex when parsing HTML. And there are also useful situations for using GOTO. And there are useful situations for variable-variables. So no particular implementation is definitively code-rot for using it. But it is a VERY strong warning sign. And the average developer isn't likely to be nuanced enough to tell the difference. So as a general rule, Regex GOTO and Variable-Variables are all evil. There are non-evil uses, but those are the exceptions (and rare at that)... (IMHO)
Sep 6, 2010 at 10:01 comment added Alohci @Gordon - thanks. HTML parsers and XML parsers are still different things though, even if they're packaged in the same library. And they're both different from DOM implementations.
Sep 6, 2010 at 9:57 comment added Gordon @Alohci DOM uses libxml and libxml has a separate HTML parser module which will be used when loading HTML with loadHTML() so it can very much load "real-world" (read broken) HTML.
Sep 6, 2010 at 9:53 comment added Alohci Neither SGML toolkits or XML parsers are suitable for parsing real world HTML. For that, only a dedicated HTML parser is appropriate.
Sep 6, 2010 at 9:48 comment added Gordon DOMComment can read comments, so no reason to use Regex for that.
Sep 6, 2010 at 9:40 history answered mario CC BY-SA 2.5

AltStyle によって変換されたページ (->オリジナル) /