This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
| Author | akuchling |
|---|---|
| Recipients | akuchling |
| Date | 2008年02月15日.15:15:01 |
| SpamBayes Score | 0.016560765 |
| Marked as misclassified | No |
| Message-id | <1203088508.92.0.545443917543.issue2124@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
Here's a simple test to demonstrate the problem:
from xml.sax import make_parser
from xml.sax.saxutils import prepare_input_source
parser = make_parser()
inp = prepare_input_source('file:file.xhtml')
parser.parse(inp)
file.xhtml contains:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" />
If you insert a debug print into saxutils.prepare_input_source,
in the branch which uses urllib.urlopen(), you get the above list of
inputs accessed: the XHTML 1.1 DTD, which is nicely modular and pulls in
all those other files.
I don't see a good way to fix this without breaking backward
compatibility to some degree. The
external-general-entities features defaults to 'on', which enables this
fetching; we could change the default to 'off', which would save the
parsing effort, but would also mean that entities like é weren't
defined.
If we had catalog support, we could ship the XHTML 1.1 DTDs and any
other DTDs of wide usage, but we don't. |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2008年02月15日 15:15:09 | akuchling | set | spambayes_score: 0.0165608 -> 0.016560765 recipients: + akuchling |
| 2008年02月15日 15:15:09 | akuchling | set | spambayes_score: 0.0165608 -> 0.0165608 messageid: <1203088508.92.0.545443917543.issue2124@psf.upfronthosting.co.za> |
| 2008年02月15日 15:15:02 | akuchling | link | issue2124 messages |
| 2008年02月15日 15:15:01 | akuchling | create | |