homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Encoding error with sax and codecs
Type: behavior Stage: patch review
Components: Library (Lib), XML Versions: Python 3.2, Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, larry, pitrou, python-dev, sconseil, serhiy.storchaka, vstinner
Priority: release blocker Keywords: patch

Created on 2013年05月06日 11:14 by sconseil, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
report.txt sconseil, 2013年05月06日 11:14 Minimal example to reproduce the issue
test_codecs.py vstinner, 2013年05月06日 21:51
XMLGenerator_codecs_stream.patch serhiy.storchaka, 2013年05月07日 13:43 review
Messages (12)
msg188508 - (view) Author: Simon Conseil (sconseil) * Date: 2013年05月06日 11:14
There is an encoding issue between codecs.open and sax (see attached file). The issue is reproducible on Python 3.3.1, it is working fine on Python 3.3.0
msg188587 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年05月06日 20:31
Since this is a regression, setting (temporarily perhaps) as release blocker.
msg188599 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年05月06日 21:48
It looks like a regression of introduced by the fix of the issue #1470548, changeset 66f92f76b2ce.
msg188600 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年05月06日 21:51
Extracted test from report.txt. Test with Python 3.4:
$ ./python test_codecs.py 
Traceback (most recent call last):
 File "test_codecs.py", line 7, in <module>
 xml.startDocument()
 File "/home/haypo/prog/python/default/Lib/xml/sax/saxutils.py", line 148, in startDocument
 self._encoding)
 File "/home/haypo/prog/python/default/Lib/codecs.py", line 699, in write
 return self.writer.write(data)
 File "/home/haypo/prog/python/default/Lib/codecs.py", line 355, in write
 data, consumed = self.encode(object, self.errors)
TypeError: Can't convert 'bytes' object to str implicitly
_gettextwriter() of xml.sax.saxutils does not recognize codecs classes. (See also the PEP 400 :-)).
msg188640 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年05月07日 10:50
It is not working fine on Python 3.3.0.
>>> with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
... xml = XMLGenerator(f, encoding='iso-8859-1')
... xml.startDocument()
... xml.startElement('root', {'attr': u'\u20ac'})
... xml.endElement('root')
... xml.endDocument()
... 
Traceback (most recent call last):
 File "<stdin>", line 4, in <module>
 File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 141, in startElement
 self._write(' %s=%s' % (name, quoteattr(value)))
 File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 96, in _write
 self._out.write(text)
 File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 699, in write
 return self.writer.write(data)
 File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 355, in write
 data, consumed = self.encode(object, self.errors)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256)
And shouldn't. On Python 2 XMLGenerator works only with binary files and "works" with text files only due implicit str->unicode converting. On Python 3 working with binary files was broken. Issue1470548 restores working with binary file (for which only XMLGenerator can work correctly), but for backward compatibility accepting of text files was left. The problem is that there no trustworthy method to determine whenever a file-like object is binary or text.
Accepting of text streams in XMLGenerator should be deprecated in future versions.
msg188642 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年05月07日 12:06
> Accepting of text streams in XMLGenerator should be deprecated in future versions.
I agree that the following pattern is strange:
with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
 xml = XMLGenerator(f, encoding='iso-8859-1')
Why would I specify a codec twice? What happens if I specify two
different codecs?
with codecs.open('/tmp/test.txt', 'w', encoding='utf-8') as f:
 xml = XMLGenerator(f, encoding='iso-8859-1')
It may be simpler (and safer?) to reject text files. If you cannot
detect that f is a text file, just make it explicit in the
documentation that f must be a binary file.
2013年5月7日 Serhiy Storchaka <report@bugs.python.org>:
>
> Serhiy Storchaka added the comment:
>
> It is not working fine on Python 3.3.0.
>
>>>> with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
> ... xml = XMLGenerator(f, encoding='iso-8859-1')
> ... xml.startDocument()
> ... xml.startElement('root', {'attr': u'\u20ac'})
> ... xml.endElement('root')
> ... xml.endDocument()
> ...
> Traceback (most recent call last):
> File "<stdin>", line 4, in <module>
> File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 141, in startElement
> self._write(' %s=%s' % (name, quoteattr(value)))
> File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 96, in _write
> self._out.write(text)
> File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 699, in write
> return self.writer.write(data)
> File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 355, in write
> data, consumed = self.encode(object, self.errors)
> UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256)
>
> And shouldn't. On Python 2 XMLGenerator works only with binary files and "works" with text files only due implicit str->unicode converting. On Python 3 working with binary files was broken. Issue1470548 restores working with binary file (for which only XMLGenerator can work correctly), but for backward compatibility accepting of text files was left. The problem is that there no trustworthy method to determine whenever a file-like object is binary or text.
>
> Accepting of text streams in XMLGenerator should be deprecated in future versions.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17915>
> _______________________________________
msg188650 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年05月07日 13:43
Here is a patch which adds explicit checks for codecs stream writers and adds tests for these cases. The tests are not entirely honest, they test only that XMLGenerator works with some specially prepared streams. XMLGenerator doesn't work with a stream with arbitrary encoding and errors handler.
msg188654 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年05月07日 13:48
Of course, if this patch will be committed, perhaps it will be worth to apply it also for 3.2 which has the same regression.
msg188657 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年05月07日 13:57
Perhaps we should add a deprecation warning for codecs streams right in this patch?
msg189003 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年05月12日 10:32
New changeset 1c01571ce0f4 by Georg Brandl in branch '3.2':
Issue #17915: Fix interoperability of xml.sax with file objects returned by
http://hg.python.org/cpython/rev/1c01571ce0f4 
msg189009 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013年05月12日 10:45
Fixed in 3.2, 3.3 and default.
msg189063 - (view) Author: Simon Conseil (sconseil) * Date: 2013年05月12日 21:19
thanks everybody !
History
Date User Action Args
2022年04月11日 14:57:45adminsetgithub: 62115
2013年05月12日 21:19:48sconseilsetmessages: + msg189063
2013年05月12日 10:45:59georg.brandlsetstatus: open -> closed
resolution: fixed
messages: + msg189009
2013年05月12日 10:32:42python-devsetnosy: + python-dev
messages: + msg189003
2013年05月07日 13:57:03serhiy.storchakasetmessages: + msg188657
2013年05月07日 13:48:21serhiy.storchakasetstage: needs patch -> patch review
messages: + msg188654
components: + XML
versions: + Python 3.2
2013年05月07日 13:43:48serhiy.storchakasetfiles: + XMLGenerator_codecs_stream.patch
keywords: + patch
messages: + msg188650
2013年05月07日 12:06:06vstinnersetmessages: + msg188642
2013年05月07日 10:50:38serhiy.storchakasetmessages: + msg188640
2013年05月06日 21:51:08vstinnersetfiles: + test_codecs.py

messages: + msg188600
2013年05月06日 21:48:19vstinnersetmessages: + msg188599
2013年05月06日 20:31:35pitrousetpriority: normal -> release blocker

nosy: + larry, pitrou, georg.brandl
messages: + msg188587

stage: needs patch
2013年05月06日 20:30:39pitrousetnosy: + vstinner, serhiy.storchaka

type: behavior
versions: + Python 3.4
2013年05月06日 11:14:06sconseilcreate

AltStyle によって変換されたページ (->オリジナル) /