homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xml.sax.saxutils.XMLGenerator cannot output UTF-16
Type: behavior Stage: resolved
Components: XML Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, BreamoreBoy, benjamin.peterson, doerwalter, georg.brandl, larry, loewis, neoecos, ngrig, pitrou, python-dev, serhiy.storchaka
Priority: release blocker Keywords: needs review, patch

Created on 2006年04月14日 20:21 by ngrig, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
saxutils.diff ngrig, 2006年04月14日 20:21 Patch for bug #1470540
XMLGenerator.patch serhiy.storchaka, 2012年05月30日 07:57 review
XMLGenerator-2.patch serhiy.storchaka, 2012年06月15日 07:20 review
XMLGenerator-3.patch serhiy.storchaka, 2012年07月15日 07:08 review
XMLGenerator-4.patch serhiy.storchaka, 2013年01月14日 13:35 review
XMLGenerator-5.patch serhiy.storchaka, 2013年01月20日 15:32 review
XMLGenerator_fragment-2.7.patch serhiy.storchaka, 2013年02月24日 09:08 review
saxutils.py neoecos, 2013年03月31日 19:33 The patched file
Messages (23)
msg50009 - (view) Author: Nikolai Grigoriev (ngrig) Date: 2006年04月14日 20:21
This is a patch to bug #1470540. It enables
xml.sax.saxutils.XMLGenerator to work correctly with
UTF-16 (and other encodings not derived from US-ASCII).
The proposed changes are as follows:
- in XMLGenerator.__init__(), create a StreamWriter
instead of a plain stream;
- in XMLGenerator._write(), convert everything to
Unicode before writing;
- in XMLGenerator.endDocument(), flush the StreamWriter.
The patch is applicable to xml/sax/saxutils.py in the
stable release (2.4.3), as well as to
xmlcore/sax/saxutils.py in the current release (2.5).
The smoke test is attached to the bug description in
the Bug Manager.
Regards,
Nikolai Grigoriev
msg66684 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008年05月11日 22:03
Won't this present backwards-compatibility problems if non-ASCII str
content is written?
msg114654 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年08月22日 09:30
The are no unit test or doc changes with the patch. Can anyone answer Georg's question on msg66684?
msg161764 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年05月28日 10:43
See also issue1767933.
Instead of codecs.StreamWriter better to use io.TextIOWrapper, because the first is slower and has numerous flaws.
msg161767 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2012年05月28日 11:07
An alternative would be to use an incremental encoder instead of a StreamWriter. (Which is what TextIOWrapper does internally).
msg161933 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年05月30日 07:57
Oh, I see XMLGenerator completely outdated. It even has not been ported to Python 3. See function _write:
 def _write(self, text):
 if isinstance(text, str):
 self._out.write(text)
 else:
 self._out.write(text.encode(self._encoding, _error_handling))
In Python 2 there was a choice between bytes and unicode strings. But in Python 3 encoding never happens.
XMLGenerator does not distinguish between binary and text streams.
Here is a patch that fixes the work of XMLGenerator in Python 3. Unfortunately, it is impossible to avoid the loss of backward compatibility. I tried to keep the code to work for the most common cases, but some code which "worked" before may break (including I had to correct some tests).
msg162851 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年06月15日 07:20
The patch updated to reflect Martin's comments. I hope the old behavior now preserved in the most used in practice cases. Tests converted to work with bytes instead of strings.
msg163740 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年06月24日 07:20
It would be nice to fix this bug before forking of the 3.3.0b1 release clone.
msg165509 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年07月15日 07:08
Here is updated patch with more careful handling of closing (as for issue1767933) and added comments.
msg172205 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月06日 15:10
Ping.
msg175472 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月12日 20:44
If nobody has any objections, why not apply this patch?
msg178326 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月27日 20:45
If no one objects I will commit this next year.
msg178369 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012年12月28日 07:26
I'd like Antoine to have a look at all that io stuff. It looks quite bloated.
In your except clause, you're not calling self._close.
msg179942 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月14日 13:35
Patch updated. Fixed an error which Georg have found. Restored testing XMLGenerator with StringIO as Antoine pointed. Now XMLGenerator tested for StringIO, BytesIO and an user writer. Added tests for encoding.
msg180297 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月20日 15:32
Patch updated. Now I get rid of __del__ to prevent hanging on reference cicles as Antoine suggested on IRC. Added test for check that XMLGenerator doesn't close the file passed as argument.
msg181797 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年02月10日 12:38
New changeset 010b455de0e0 by Serhiy Storchaka in branch '2.7':
Issue #1470548: XMLGenerator now works with UTF-16 and UTF-32 encodings.
http://hg.python.org/cpython/rev/010b455de0e0
New changeset 66f92f76b2ce by Serhiy Storchaka in branch '3.2':
Issue #1470548: XMLGenerator now works with binary output streams.
http://hg.python.org/cpython/rev/66f92f76b2ce
New changeset 03b878d636cf by Serhiy Storchaka in branch '3.3':
Issue #1470548: XMLGenerator now works with binary output streams.
http://hg.python.org/cpython/rev/03b878d636cf
New changeset 12d75ca12ae7 by Serhiy Storchaka in branch 'default':
Issue #1470548: XMLGenerator now works with binary output streams.
http://hg.python.org/cpython/rev/12d75ca12ae7 
msg182819 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013年02月23日 20:50
The change in 2.7 branch breaks some software, including a test of Django (produce_xml_fragment from https://github.com/django/django/blob/1.4.5/tests/regressiontests/test_utils/tests.py).
The problem seems to not occur with Python 3.2, 3.3 and 3.4.
Before 010b455de0e0:
>>> from StringIO import StringIO
>>> from xml.sax.saxutils import XMLGenerator
>>> stream = StringIO()
>>> xml = XMLGenerator(stream, encoding='utf-8')
>>> xml.startElement("foo", {"aaa": "1.0", "bbb": "2.0"})
>>> xml.characters("Hello")
>>> xml.endElement("foo")
>>> xml.startElement("bar", {"ccc": "3.0", "ddd": "4.0"})
>>> xml.endElement("bar")
>>> stream.getvalue()
'<foo aaa="1.0" bbb="2.0">Hello</foo><bar ccc="3.0" ddd="4.0"></bar>'
>>>
After 010b455de0e0:
>>> from StringIO import StringIO
>>> from xml.sax.saxutils import XMLGenerator
>>> stream = StringIO()
>>> xml = XMLGenerator(stream, encoding='utf-8')
>>> xml.startElement("foo", {"aaa": "1.0", "bbb": "2.0"})
>>> xml.characters("Hello")
>>> xml.endElement("foo")
>>> xml.startElement("bar", {"ccc": "3.0", "ddd": "4.0"})
>>> xml.endElement("bar")
>>> stream.getvalue()
''
>>>
msg182861 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年02月24日 09:08
Thank you for report. Here is a patch which fixes this bug.
msg182892 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013年02月24日 20:52
This patch works for me.
msg182930 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年02月25日 11:32
New changeset d707e3345a74 by Serhiy Storchaka in branch '2.7':
Issue #1470548: Do not buffer XMLGenerator output.
http://hg.python.org/cpython/rev/d707e3345a74 
msg182931 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年02月25日 11:49
New changeset 1c03e499cdc2 by Serhiy Storchaka in branch '3.2':
Issue #1470548: Add test for fragment producing with XMLGenerator.
http://hg.python.org/cpython/rev/1c03e499cdc2
New changeset 5a4b3094903f by Serhiy Storchaka in branch '3.3':
Issue #1470548: Add test for fragment producing with XMLGenerator.
http://hg.python.org/cpython/rev/5a4b3094903f
New changeset 810d70fb17a2 by Serhiy Storchaka in branch 'default':
Issue #1470548: Add test for fragment producing with XMLGenerator.
http://hg.python.org/cpython/rev/810d70fb17a2 
msg185644 - (view) Author: Sebastian Ortiz Vasquez (neoecos) Date: 2013年03月31日 19:33
I have been working with this in order to generate an RSS feed using web2py.
I found, XMLGenerator method does not validate if is an unicode or string type, and it does not encode accord the encoding parameter of the XMLGenerator.
I added changed the method to verify if is an unicode object or try to convert to it using the desired encoding.
Recall that the _write UnbufferedTextIOWrapper receives an unicode object as parameter.
 def characters(self, content):
 if isinstance(content, unicode): 
 self._write(escape(content))
	else:
	 self._write(escape(unicode(content,self._encoding)))
msg185682 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013年03月31日 21:51
Sebastian Ortiz Vasquez: Please file a new issue and attach a patch (in unified format) instead of a whole Python module.
History
Date User Action Args
2022年04月11日 14:56:16adminsetgithub: 43215
2013年03月31日 22:03:43Arfreversetversions: + Python 3.2, Python 3.3, Python 3.4
2013年03月31日 21:51:15Arfreversetmessages: + msg185682
title: Bugfix for #1470540 (XMLGenerator cannot output UTF-16 or UTF-8) -> xml.sax.saxutils.XMLGenerator cannot output UTF-16
2013年03月31日 19:33:14neoecossetfiles: + saxutils.py

nosy: + neoecos
versions: - Python 3.2, Python 3.3, Python 3.4
messages: + msg185644

title: Bugfix for #1470540 (XMLGenerator cannot output UTF-16) -> Bugfix for #1470540 (XMLGenerator cannot output UTF-16 or UTF-8)
2013年02月25日 11:50:36serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: resolved
2013年02月25日 11:49:19python-devsetmessages: + msg182931
2013年02月25日 11:32:14python-devsetmessages: + msg182930
2013年02月24日 20:52:51Arfreversetmessages: + msg182892
2013年02月24日 09:08:15serhiy.storchakasetfiles: + XMLGenerator_fragment-2.7.patch

messages: + msg182861
2013年02月23日 20:50:30Arfreversetstatus: closed -> open
priority: normal -> release blocker


nosy: + Arfrever, benjamin.peterson, larry
messages: + msg182819
resolution: fixed -> (no value)
stage: resolved -> (no value)
2013年02月10日 15:23:06serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013年02月10日 12:38:00python-devsetnosy: + python-dev
messages: + msg181797
2013年01月20日 15:32:51serhiy.storchakasetfiles: + XMLGenerator-5.patch

messages: + msg180297
2013年01月14日 13:36:14serhiy.storchakasetstage: needs patch -> patch review
2013年01月14日 13:35:33serhiy.storchakasetkeywords: - easy
files: + XMLGenerator-4.patch
messages: + msg179942
2012年12月30日 18:40:40serhiy.storchakasetstage: patch review -> needs patch
2012年12月28日 07:26:14georg.brandlsetnosy: + pitrou
messages: + msg178369
2012年12月27日 20:47:56serhiy.storchakasetassignee: serhiy.storchaka
2012年12月27日 20:45:56serhiy.storchakasetmessages: + msg178326
2012年11月12日 20:44:06serhiy.storchakasetmessages: + msg175472
2012年10月24日 09:02:24serhiy.storchakasetstage: patch review
2012年10月20日 20:09:40serhiy.storchakasetkeywords: + needs review
stage: test needed -> (no value)
versions: + Python 3.4, - Python 3.1
2012年10月06日 15:10:51serhiy.storchakasetmessages: + msg172205
2012年08月05日 11:14:07serhiy.storchakalinkissue4997 superseder
2012年07月20日 06:58:46eli.benderskysetnosy: - eli.bendersky
2012年07月15日 07:08:12serhiy.storchakasetfiles: + XMLGenerator-3.patch
nosy: + eli.bendersky
messages: + msg165509

2012年06月24日 07:20:37serhiy.storchakasetmessages: + msg163740
2012年06月15日 07:20:50serhiy.storchakasetfiles: + XMLGenerator-2.patch

messages: + msg162851
2012年05月30日 07:58:37serhiy.storchakasetnosy: + loewis
2012年05月30日 07:57:37serhiy.storchakasetfiles: + XMLGenerator.patch

messages: + msg161933
2012年05月28日 11:07:58doerwaltersetnosy: + doerwalter
messages: + msg161767
2012年05月28日 10:43:25serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg161764
versions: + Python 3.3
2010年08月22日 09:30:57BreamoreBoysetnosy: + BreamoreBoy

messages: + msg114654
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009年04月05日 13:45:12georg.brandllinkissue1470540 superseder
2009年04月05日 13:45:12georg.brandlunlinkissue1470540 dependencies
2009年03月21日 02:02:41ajaksu2setstage: test needed
type: behavior
versions: + Python 2.6, - Python 2.5
2009年03月21日 02:02:11ajaksu2linkissue1470540 dependencies
2008年05月11日 22:03:08georg.brandlsetnosy: + georg.brandl
messages: + msg66684
2008年01月21日 13:57:10akuchlingsetkeywords: + easy
2006年04月14日 20:21:23ngrigcreate

AltStyle によって変換されたページ (->オリジナル) /