homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: sqlite3.Connection.iterdump() dies with encoding exception
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: biny, ekontsevoy, eric.smith, petri.lehtinen, python-dev, r.david.murray
Priority: high Keywords:

Created on 2012年06月19日 22:18 by ekontsevoy, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
death.py ekontsevoy, 2012年06月20日 01:09
Messages (10)
msg163227 - (view) Author: Ev Kontsevoy (ekontsevoy) Date: 2012年06月19日 22:18
When calling connection.iterdump() on a database with non-ASCII string values, the following exception is raised:
----------------------------------------------------
File "/python-2.7.3/lib/python2.7/sqlite3/dump.py", line 56, in _iterdump
 yield("{0};".format(row[0]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 48-51: ordinal not in range(128)
----------------------------------------------------
The older versions used the following (safer) version in /python-2.7.3/lib/python2.7/sqlite3/dump.py:56:
yield("%s;" % row[0])
msg163230 - (view) Author: Ev Kontsevoy (ekontsevoy) Date: 2012年06月19日 22:53
Proposed fix:
maybe 
yield(u"%s;" % row[0]) 
or simply
row[0] + ";"?
msg163235 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年06月20日 00:48
It's not clear to me why the behavior differs. Hopefully Eric will explain.
For 2.7 we should probably just revert the change to the yield statement to restore the previous behavior, unless format can be fixed.
msg163237 - (view) Author: Ev Kontsevoy (ekontsevoy) Date: 2012年06月20日 00:57
If the behavior of string.format() can be fixed to act identically to u"%s" % "" that would be simply wonderful!
Currently at work we have a rule in place: to never use string.format() since it cannot be used for anything but constants due to encoding exceptions.
msg163239 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2012年06月20日 01:02
Could you reproduce this in a short script that doesn't use sqlite? I'm looking for something like:
str = 'some-string'
"{0}".format(str)
Also: is that the entire traceback? I don't see how format could be invoking a codec. Maybe the error occurs when writing it to stdout, or some other operation that's encoding?
msg163241 - (view) Author: Ev Kontsevoy (ekontsevoy) Date: 2012年06月20日 01:09
I am attaching death.py file which dies on string.format()
The stack trace above is at the full depth. Python doesn't print anything from inside of format().
msg163243 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年06月20日 01:49
>>> print('{}'.format(u'\u2107'))
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2107' in position 0: ordinal not in range(128)
>>> print('%s' % u'\u2107')
Ɛ
(You get the exception without the print as well, just in case that isn't clear.)
Ah, and now I see why this is true. The '%s' gets implicitly coerced to unicode. So, it is not a bug in format, and the yield statement change should be reverted.
You can use format if you just always make your format input strings unicode strings (which you should be doing anyway, especially now that python3.3 will allow the 'u' prefix...that is, such code will be forward-compatible with Python3).
msg163244 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年06月20日 01:50
Or use 'from __future__ import unicode_literals'.
msg163246 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年06月20日 01:58
Note that this is a regression in 2.7.3 relative to 2.7.2, which is why I'm marking it as high priority.
msg179614 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月11日 02:12
New changeset 2a417ad8bfbf by R David Murray in branch '2.7':
#15109: revert '%'->'format' changes in 4b105d328fe7 to fix regression.
http://hg.python.org/cpython/rev/2a417ad8bfbf 
History
Date User Action Args
2022年04月11日 14:57:31adminsetgithub: 59314
2013年01月11日 02:13:00r.david.murraysetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2013年01月11日 02:12:17python-devsetnosy: + python-dev
messages: + msg179614
2012年07月17日 18:48:35binysetnosy: + biny
2012年06月20日 01:58:43r.david.murraysetpriority: normal -> high

nosy: + petri.lehtinen
messages: + msg163246

stage: needs patch
2012年06月20日 01:50:29r.david.murraysetmessages: + msg163244
2012年06月20日 01:49:14r.david.murraysetmessages: + msg163243
2012年06月20日 01:09:38ekontsevoysetfiles: + death.py

messages: + msg163241
2012年06月20日 01:02:39eric.smithsetmessages: + msg163239
2012年06月20日 00:57:16ekontsevoysetmessages: + msg163237
2012年06月20日 00:48:59r.david.murraysetnosy: + r.david.murray, eric.smith
messages: + msg163235
2012年06月19日 22:53:19ekontsevoysetmessages: + msg163230
2012年06月19日 22:19:08ekontsevoysettype: behavior
2012年06月19日 22:18:47ekontsevoycreate

AltStyle によって変換されたページ (->オリジナル) /