homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: maybe doctest doesn't understand unicode_literals?
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Matthis Thorade, christoph, georg.brandl, mark, r.david.murray, tim.peters
Priority: normal Keywords:

Created on 2008年09月24日 12:37 by mark, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.py christoph, 2009年06月30日 15:03 Test case revealing Unicode literal weakness
Messages (8)
msg73710 - (view) Author: Mark Summerfield (mark) * Date: 2008年09月24日 12:37
# This program works fine with Python 2.5 and 2.6:
def f():
 """
 >>> f()
 'xyz'
 """
 return "xyz"
if __name__ == "__main__":
 import doctest
 doctest.testmod()
But if you put the statement "from __future__ import unicode_literals"
at the start then it fails:
File "/tmp/test.py", line 5, in __main__.f
Failed example:
 f()
Expected:
 'xyz'
Got:
 u'xyz'
I don't know if it is a bug or a feature but I didn't see any mention of
it in the bugs or docs so thought I'd mention it.
msg73728 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008年09月24日 16:29
It certainly isn't a feature. I don't immediately see how to fix it,
though. unicode_literals doesn't change the repr() of unicode objects
(it obviously can't, since that change would not be module-local).
Let's try to get a comment from Uncle Timmy...
msg89874 - (view) Author: Christoph Burgmer (christoph) Date: 2009年06月29日 19:19
OutputChecker.check_output() seems to be responsible for comparing
'example.want' and 'got' literals and this is obviously done literally.
So as "u'1'" is different to "'1'" this is reflected in the result.
This gets more complicated with literals like "[u'1', u'2']" I believe.
So, eval() could be used for testing for equality:
>>> repr(['1', '2']) == repr([u'1', u'2'])
False
but
>>> eval(repr(['1', '2'])) == eval(repr([u'1', u'2']))
True
doctests are already compiled and executed, but evaluating the doctest
code's result is probably a security issue, so a method doing the
invers of repr() could be used, that only works on variables; something
like Pickle, but without its own protocol.
msg89927 - (view) Author: Christoph Burgmer (christoph) Date: 2009年06月30日 15:03
This problem seems more severe as the appended test case shows.
That gives me:
Expected:
 u'ī'
Got:
 u'\u012b'
Both literals are the same.
Unicode literals in doc strings are not treated as other escaped
characters: 
>>> repr(r'\n')
"'\\\\n'"
>>> repr('\n')
"'\\n'"
but:
>>> repr(ur'\u012b')
"u'\\u012b'"
>>> repr(u'\u012b')
"u'\\u012b'"
So there is no work around in the docstring's reference itself.
I file this here, even though the problems are not strictly equal. I do
believe though that there is or should be a common solution to these
issues. Both results need to be interpreted on a more abstract scale.
msg89997 - (view) Author: Christoph Burgmer (christoph) Date: 2009年07月01日 20:25
JFTR: To yield the results of my last comment, you need to apply the
patch posted in http://bugs.python.org/issue1293741 
msg162577 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年06月10日 02:23
I fail to see the problem here. If the module has 'from __future__ import unicode_literals", then the docstring output clauses would need to be changed to reflect the fact that the input literals are now unicode. What am I missing?
msg162724 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012年06月13日 19:19
Yeah, I don't really remember now what my point was.
msg287530 - (view) Author: Matthis Thorade (Matthis Thorade) Date: 2017年02月10日 13:10
I found this bug when trying to write a doctest that passes on Python 3.5 and Python 2.7.9.
The following adapted example passes on Python2, but fails on Python3:
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
def f():
 """
 >>> f()
 u'xyz'
 """
 return "xyz"
if __name__ == "__main__":
 import doctest
 doctest.testmod()
I think a nice solution could be to add a new directive so that I can use the following
def myUnic():
 """
 This is a small demo that just returns a string.
 >>> myUnic()
 u'abc' # doctest: +ALLOW_UNICODE
 """
 return 'abc'
I asked the same question here:
http://stackoverflow.com/questions/42158733/unicode-literals-and-doctest-in-python-2-7-and-python-3-5 
History
Date User Action Args
2022年04月11日 14:56:39adminsetgithub: 48205
2017年02月10日 13:10:47Matthis Thoradesetnosy: + Matthis Thorade
messages: + msg287530
2012年06月13日 19:19:46georg.brandlsetstatus: pending -> closed

messages: + msg162724
2012年06月10日 02:23:07r.david.murraysetstatus: open -> pending

assignee: tim.peters ->

nosy: + r.david.murray
messages: + msg162577
resolution: not a bug
stage: resolved
2009年07月01日 20:25:04christophsetmessages: + msg89997
2009年06月30日 15:03:19christophsetfiles: + test.py

messages: + msg89927
2009年06月29日 19:19:40christophsetnosy: + christoph
messages: + msg89874
2008年09月24日 16:29:09georg.brandlsetassignee: tim.peters
messages: + msg73728
nosy: + georg.brandl, tim.peters
2008年09月24日 12:37:20markcreate

AltStyle によって変換されたページ (->オリジナル) /