This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013年05月06日 11:14 by sconseil, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| report.txt | sconseil, 2013年05月06日 11:14 | Minimal example to reproduce the issue | ||
| test_codecs.py | vstinner, 2013年05月06日 21:51 | |||
| XMLGenerator_codecs_stream.patch | serhiy.storchaka, 2013年05月07日 13:43 | review | ||
| Messages (12) | |||
|---|---|---|---|
| msg188508 - (view) | Author: Simon Conseil (sconseil) * | Date: 2013年05月06日 11:14 | |
There is an encoding issue between codecs.open and sax (see attached file). The issue is reproducible on Python 3.3.1, it is working fine on Python 3.3.0 |
|||
| msg188587 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年05月06日 20:31 | |
Since this is a regression, setting (temporarily perhaps) as release blocker. |
|||
| msg188599 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年05月06日 21:48 | |
It looks like a regression of introduced by the fix of the issue #1470548, changeset 66f92f76b2ce. |
|||
| msg188600 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年05月06日 21:51 | |
Extracted test from report.txt. Test with Python 3.4: $ ./python test_codecs.py Traceback (most recent call last): File "test_codecs.py", line 7, in <module> xml.startDocument() File "/home/haypo/prog/python/default/Lib/xml/sax/saxutils.py", line 148, in startDocument self._encoding) File "/home/haypo/prog/python/default/Lib/codecs.py", line 699, in write return self.writer.write(data) File "/home/haypo/prog/python/default/Lib/codecs.py", line 355, in write data, consumed = self.encode(object, self.errors) TypeError: Can't convert 'bytes' object to str implicitly _gettextwriter() of xml.sax.saxutils does not recognize codecs classes. (See also the PEP 400 :-)). |
|||
| msg188640 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月07日 10:50 | |
It is not working fine on Python 3.3.0.
>>> with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
... xml = XMLGenerator(f, encoding='iso-8859-1')
... xml.startDocument()
... xml.startElement('root', {'attr': u'\u20ac'})
... xml.endElement('root')
... xml.endDocument()
...
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 141, in startElement
self._write(' %s=%s' % (name, quoteattr(value)))
File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 96, in _write
self._out.write(text)
File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 699, in write
return self.writer.write(data)
File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 355, in write
data, consumed = self.encode(object, self.errors)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256)
And shouldn't. On Python 2 XMLGenerator works only with binary files and "works" with text files only due implicit str->unicode converting. On Python 3 working with binary files was broken. Issue1470548 restores working with binary file (for which only XMLGenerator can work correctly), but for backward compatibility accepting of text files was left. The problem is that there no trustworthy method to determine whenever a file-like object is binary or text.
Accepting of text streams in XMLGenerator should be deprecated in future versions.
|
|||
| msg188642 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年05月07日 12:06 | |
> Accepting of text streams in XMLGenerator should be deprecated in future versions.
I agree that the following pattern is strange:
with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
xml = XMLGenerator(f, encoding='iso-8859-1')
Why would I specify a codec twice? What happens if I specify two
different codecs?
with codecs.open('/tmp/test.txt', 'w', encoding='utf-8') as f:
xml = XMLGenerator(f, encoding='iso-8859-1')
It may be simpler (and safer?) to reject text files. If you cannot
detect that f is a text file, just make it explicit in the
documentation that f must be a binary file.
2013年5月7日 Serhiy Storchaka <report@bugs.python.org>:
>
> Serhiy Storchaka added the comment:
>
> It is not working fine on Python 3.3.0.
>
>>>> with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
> ... xml = XMLGenerator(f, encoding='iso-8859-1')
> ... xml.startDocument()
> ... xml.startElement('root', {'attr': u'\u20ac'})
> ... xml.endElement('root')
> ... xml.endDocument()
> ...
> Traceback (most recent call last):
> File "<stdin>", line 4, in <module>
> File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 141, in startElement
> self._write(' %s=%s' % (name, quoteattr(value)))
> File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 96, in _write
> self._out.write(text)
> File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 699, in write
> return self.writer.write(data)
> File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 355, in write
> data, consumed = self.encode(object, self.errors)
> UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256)
>
> And shouldn't. On Python 2 XMLGenerator works only with binary files and "works" with text files only due implicit str->unicode converting. On Python 3 working with binary files was broken. Issue1470548 restores working with binary file (for which only XMLGenerator can work correctly), but for backward compatibility accepting of text files was left. The problem is that there no trustworthy method to determine whenever a file-like object is binary or text.
>
> Accepting of text streams in XMLGenerator should be deprecated in future versions.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17915>
> _______________________________________
|
|||
| msg188650 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月07日 13:43 | |
Here is a patch which adds explicit checks for codecs stream writers and adds tests for these cases. The tests are not entirely honest, they test only that XMLGenerator works with some specially prepared streams. XMLGenerator doesn't work with a stream with arbitrary encoding and errors handler. |
|||
| msg188654 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月07日 13:48 | |
Of course, if this patch will be committed, perhaps it will be worth to apply it also for 3.2 which has the same regression. |
|||
| msg188657 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月07日 13:57 | |
Perhaps we should add a deprecation warning for codecs streams right in this patch? |
|||
| msg189003 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年05月12日 10:32 | |
New changeset 1c01571ce0f4 by Georg Brandl in branch '3.2': Issue #17915: Fix interoperability of xml.sax with file objects returned by http://hg.python.org/cpython/rev/1c01571ce0f4 |
|||
| msg189009 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2013年05月12日 10:45 | |
Fixed in 3.2, 3.3 and default. |
|||
| msg189063 - (view) | Author: Simon Conseil (sconseil) * | Date: 2013年05月12日 21:19 | |
thanks everybody ! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:45 | admin | set | github: 62115 |
| 2013年05月12日 21:19:48 | sconseil | set | messages: + msg189063 |
| 2013年05月12日 10:45:59 | georg.brandl | set | status: open -> closed resolution: fixed messages: + msg189009 |
| 2013年05月12日 10:32:42 | python-dev | set | nosy:
+ python-dev messages: + msg189003 |
| 2013年05月07日 13:57:03 | serhiy.storchaka | set | messages: + msg188657 |
| 2013年05月07日 13:48:21 | serhiy.storchaka | set | stage: needs patch -> patch review messages: + msg188654 components: + XML versions: + Python 3.2 |
| 2013年05月07日 13:43:48 | serhiy.storchaka | set | files:
+ XMLGenerator_codecs_stream.patch keywords: + patch messages: + msg188650 |
| 2013年05月07日 12:06:06 | vstinner | set | messages: + msg188642 |
| 2013年05月07日 10:50:38 | serhiy.storchaka | set | messages: + msg188640 |
| 2013年05月06日 21:51:08 | vstinner | set | files:
+ test_codecs.py messages: + msg188600 |
| 2013年05月06日 21:48:19 | vstinner | set | messages: + msg188599 |
| 2013年05月06日 20:31:35 | pitrou | set | priority: normal -> release blocker nosy: + larry, pitrou, georg.brandl messages: + msg188587 stage: needs patch |
| 2013年05月06日 20:30:39 | pitrou | set | nosy:
+ vstinner, serhiy.storchaka type: behavior versions: + Python 3.4 |
| 2013年05月06日 11:14:06 | sconseil | create | |