0

Suppose you have the following HTML:

<style><input><div name="myDiv"></div></style>

You want to load it into a PHP DOMDocument object, how should you do it? If you use $doc->loadHTML() it will have the problem that the <div> is inside the <style> tag. If you use $doc->loadXML() it will have the problem that the <input> tag doesn't close.

Note: I can't edit the HTML, only the PHP used to parse it, because I'm scraping here.

Félix Saparelli
8,7596 gold badges56 silver badges68 bronze badges
asked Jun 18, 2011 at 2:10

2 Answers 2

5

Try this:

$doc = new DOMDocument;
$doc->recover = true;
$doc->loadXml($response);

The $doc->recover = true tells DOMDocument to try and parse non-well formed documents. See the documentation for more information.

answered Jun 18, 2011 at 2:18
Sign up to request clarification or add additional context in comments.

Comments

0

Can't you turn the html into a string, explode it and then stitch it back with the closing tag?

answered Jun 18, 2011 at 2:13

1 Comment

This is just a small segment of the HTML i'm dealing with. there are tons of inputs and tons of invalid tags (like divs inside styles). This is just a small sample to show the problem

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.