homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Michael.Felt
Recipients Michael.Felt, michael-o, terry.reedy, vstinner
Date 2018年08月27日.20:58:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <77b3d8a0-5304-cd3e-fab8-fb2af52359af@felt.demon.nl>
In-reply-to <1535376128.65.0.56676864532.issue34403@psf.upfronthosting.co.za>
Content
On 27/08/2018 15:22, Michael Osipov wrote:
> Michael Osipov <1983年01月06日@gmx.net> added the comment:
>
> So I changed the test code to:
>
> diff --git a/Lib/test/test_utf8_mode.py b/Lib/test/test_utf8_mode.py
> index 26e2e13ec5..d9f8a3ed8b 100644
> --- a/Lib/test/test_utf8_mode.py
> +++ b/Lib/test/test_utf8_mode.py
> @@ -208,7 +208,7 @@ class UTF8ModeTests(unittest.TestCase):
> def test_cmd_line(self):
> arg = 'h\xe9\u20ac'.encode('utf-8')
> arg_utf8 = arg.decode('utf-8')
> - arg_ascii = arg.decode('ascii', 'surrogateescape')
> + arg_ascii = arg.decode('roman8', 'surrogateescape')
> code = 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:])))'
>
> def check(utf8_opt, expected, **kw):
>
> and the output is:
> ======================================================================
> FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 224, in test_cmd_line
> check('utf8=0', [c_arg], LC_ALL='C')
> File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 217, in check
> self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\xfb\\u02cb\\xe3\\x82\\u02dc']"
> - ['h\xc3\xa9\xe2\x82\xac']
> + ['h\xfb\u02cb\xe3\x82\u02dc']
> : roman8:['h\xc3\xa9\xe2\x82\xac']
>
> I still don't understand that.
Something I found helpful was to change:
    check('utf8=0', [c_arg], LC_ALL='C')
to
    check('utf8=0', [c_arg], LC_ALL='C', failure=True )
This also fails, but it shows what is being executed.
Further, my 'understanding' is that ascii(whatever) is much smarter than
whatever.decode('ascii', ...) does. Also, ascii() tends to use the \x
shorthand, while decode('ascii', 'surrogateescape') uses the \udc prefix.
And, while you might still consider it a 'bug', did you try using c_arg
= arg.decode('iso-88859-1') ?
Michael (F)
>
> I believe that surrogate escape only works for ASCII and nothing else. If so, this test must be skipped on HP-UX and AIX.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue34403>
> _______________________________________
>
History
Date User Action Args
2018年08月27日 20:58:46Michael.Feltsetrecipients: + Michael.Felt, terry.reedy, vstinner, michael-o
2018年08月27日 20:58:46Michael.Feltlinkissue34403 messages
2018年08月27日 20:58:46Michael.Feltcreate

AltStyle によって変換されたページ (->オリジナル) /