Timeline for answer to How do you parse and process HTML/XML in PHP? by mario
Current License: CC BY-SA 3.0
Post Revisions
13 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 7, 2016 at 10:43 | history | edited | mario | CC BY-SA 3.0 |
added 168 characters in body
|
| Jul 2, 2014 at 20:48 | comment | added | hek2mgl |
Most XML parsers cannot see HTML document comments I'm not sure which parser you are using, but my parser can "read" comments. -1
|
|
| Dec 26, 2013 at 18:35 | history | edited | Amal | CC BY-SA 3.0 |
Dead link.
|
| Dec 3, 2011 at 20:20 | history | post merged (destination) | |||
| Jan 22, 2011 at 13:10 | vote | accept | Pekka | ||
| Jan 22, 2011 at 13:10 | |||||
| Nov 21, 2010 at 1:38 | comment | added | tchrist | @mario: Actually, HTML can be ‘properly’ parsed using regexes, although usually it takes several of them to do a fair job a tit. It’s just a royal pain in the general case. In specific cases with well-defined input, it verges on trivial. Those are the cases that people should be using regexes on. Big old hungry heavy parsers are really what you need for general cases, though it isn’t always clear to the casual user where to draw that line. Whichever code is simpler and easier, wins. | |
| Sep 7, 2010 at 14:56 | history | edited | mario | CC BY-SA 2.5 |
added 83 characters in body
|
| Sep 7, 2010 at 12:11 | comment | added | ircmaxell | Well, just a comment about your "real-world consideration" standpoint. Sure, there ARE useful situations for Regex when parsing HTML. And there are also useful situations for using GOTO. And there are useful situations for variable-variables. So no particular implementation is definitively code-rot for using it. But it is a VERY strong warning sign. And the average developer isn't likely to be nuanced enough to tell the difference. So as a general rule, Regex GOTO and Variable-Variables are all evil. There are non-evil uses, but those are the exceptions (and rare at that)... (IMHO) | |
| Sep 6, 2010 at 10:01 | comment | added | Alohci | @Gordon - thanks. HTML parsers and XML parsers are still different things though, even if they're packaged in the same library. And they're both different from DOM implementations. | |
| Sep 6, 2010 at 9:57 | comment | added | Gordon |
@Alohci DOM uses libxml and libxml has a separate HTML parser module which will be used when loading HTML with loadHTML() so it can very much load "real-world" (read broken) HTML.
|
|
| Sep 6, 2010 at 9:53 | comment | added | Alohci | Neither SGML toolkits or XML parsers are suitable for parsing real world HTML. For that, only a dedicated HTML parser is appropriate. | |
| Sep 6, 2010 at 9:48 | comment | added | Gordon |
DOMComment can read comments, so no reason to use Regex for that.
|
|
| Sep 6, 2010 at 9:40 | history | answered | mario | CC BY-SA 2.5 |