homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ngrig
Recipients
Date 2006年04月14日.20:07:46
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
When output encoding in xml.sax.saxutils.XMLGenerator
is set to UTF-16, the result is a terrible mess. Namely:
- it does not encode the XML declaration at the very
top of the file (leaving it in single-byte Latin);
- it leaves closing '>' of each start tag unencoded
(that is, always outputs a single byte);
- it inserts a spurious byte order mark for each tag,
each attribute, each text node, and each processing
instruction.
A test illustrating the issue is attached. The issue is
applicable to both stable (2.4.3) and current (2.5)
versions of Python.
---------------------------------------------
Looking in xml/sax/saxutils.py, I see the problem in
XMLGenerator._write():
 - one-byte strings aren't recoded at all (sic!);
 - two-byte strings are converted using
unicode.encode(); this results in a BOM for each call of
_write() on Unicode strings.
The issue is easy to fix by using StreamWriter instead
of a plain stream as the output sink. I am going to
submit a patch shortly.
Regards,
Nikolai Grigoriev 
History
Date User Action Args
2007年08月23日 14:39:27adminlinkissue1470540 messages
2007年08月23日 14:39:27admincreate

AltStyle によって変換されたページ (->オリジナル) /