Skip to main content

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Timeline for answer to How do you parse and process HTML/XML in PHP? by mario

Current License: CC BY-SA 3.0

Post Revisions

13 events

when toggle format	what	by	license	comment
May 7, 2016 at 10:43	history	edited	mario	CC BY-SA 3.0	added 168 characters in body
Jul 2, 2014 at 20:48	comment	added	hek2mgl		`Most XML parsers cannot see HTML document comments` I'm not sure which parser you are using, but my parser can "read" comments. -1
Dec 26, 2013 at 18:35	history	edited	Amal	CC BY-SA 3.0	Dead link.
Dec 3, 2011 at 20:20	history	post merged (destination)
Jan 22, 2011 at 13:10	vote	accept	Pekka
Jan 22, 2011 at 13:10
Nov 21, 2010 at 1:38	comment	added	tchrist		@mario: Actually, HTML can be ‘properly’ parsed using regexes, although usually it takes several of them to do a fair job a tit. It’s just a royal pain in the general case. In specific cases with well-defined input, it verges on trivial. Those are the cases that people should be using regexes on. Big old hungry heavy parsers are really what you need for general cases, though it isn’t always clear to the casual user where to draw that line. Whichever code is simpler and easier, wins.
Sep 7, 2010 at 14:56	history	edited	mario	CC BY-SA 2.5	added 83 characters in body
Sep 7, 2010 at 12:11	comment	added	ircmaxell		Well, just a comment about your "real-world consideration" standpoint. Sure, there ARE useful situations for Regex when parsing HTML. And there are also useful situations for using GOTO. And there are useful situations for variable-variables. So no particular implementation is definitively code-rot for using it. But it is a VERY strong warning sign. And the average developer isn't likely to be nuanced enough to tell the difference. So as a general rule, Regex GOTO and Variable-Variables are all evil. There are non-evil uses, but those are the exceptions (and rare at that)... (IMHO)
Sep 6, 2010 at 10:01	comment	added	Alohci		@Gordon - thanks. HTML parsers and XML parsers are still different things though, even if they're packaged in the same library. And they're both different from DOM implementations.
Sep 6, 2010 at 9:57	comment	added	Gordon		@Alohci `DOM` uses libxml and libxml has a separate HTML parser module which will be used when loading HTML with `loadHTML()` so it can very much load "real-world" (read broken) HTML.
Sep 6, 2010 at 9:53	comment	added	Alohci		Neither SGML toolkits or XML parsers are suitable for parsing real world HTML. For that, only a dedicated HTML parser is appropriate.
Sep 6, 2010 at 9:48	comment	added	Gordon		`DOMComment` can read comments, so no reason to use Regex for that.
Sep 6, 2010 at 9:40	history	answered	mario	CC BY-SA 2.5