I have some problem here
$source = "<html><body><h1>“</h1></body></html>";
$dom = new DOMDocument();
$dom->loadHTML($source);
echo $dom->saveHTML();
Output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><h1>“</h1></body></html>
Ok, this work correctly. But if I want to extract the nodes like this
$source = "<html><body><h1>“</h1></body></html>";
$dom = new DOMDocument();
$dom->loadHTML($source);
$h1 = $dom->getElementsByTagName('h1');
echo $dom->saveHTML($h1->item(0));
It output unrecognized text.
<h1>“</h1>
Anyone know how to solve this?
-
1All those DOM functions return UTF-8 encoded strings, better check the manual. There is nothing to solve but just to properly display, e.g. tell your browser by either properly configuring your response headers or using the menu in your browser where you can specify the charset-encoding if you don't know how to tell the browser automatically. See webstandards.org/learn/articles/askw3c/dec2002hakre– hakre2012年02月23日 14:37:04 +00:00Commented Feb 23, 2012 at 14:37
2 Answers 2
Your code example works for me, output is <h1>"</h1>
.
“ <ENTITY TYPE="#8220"/> " Left double quotation mark
Binary UTF-8 sequence of "
is:
0xE2 (226) 0x80 (128) 0x9C (156)
| | `------ Windows-1252: œ
| `--- most Windows 125x encodings: €
`--- ISO 8859-1, 2, 3, 4, 9, 10, 14, 15, 16: â
So where do you view that output?
Probably inside your browser on windows? If inside your browser, have you tried adding
header('Content-Type: text/html; charset=utf-8');
on top of your script?
See also: Setting the HTTP charset parameter and Checking HTTP Headers.
you need the second parameter of the domdocument constructor (checkout http://nl.php.net/manual/en/domdocument.construct.php):
$dom = new DOMDocument('1.0', 'utf-8');
-
1the HTML source I used to load already encoded, “ . It output correctly when $dom->saveHTML(), but it output to unknown unicode if I print selected nodes, $dom->saveHTML($nodes);haohan– haohan2012年02月23日 14:42:30 +00:00Commented Feb 23, 2012 at 14:42