Issue2164
Created on 2014年06月10日.20:57:28 by zyasoft, last changed 2018年03月16日.22:54:51 by jeff.allen.
| Messages | |||
|---|---|---|---|
| msg8624 (view) | Author: Jim Baker (zyasoft) | Date: 2014年06月10日.20:57:27 | |
Difference between CPython and Jython seen with this example: # -*- coding: utf-8 -*- import codecs data = memoryview(b"中文") text, decoded_bytes = codecs.utf_8_decode(data) assert text == u"中文" assert type(text) is unicode assert decoded_bytes == 6 This works fine on CPython. On Jython, it fails with TypeError: utf_8_decode(): 1st arg can't be coerced to String Current workaround is to use tobytes on the memoryview object: text, decoded_bytes = codecs.utf_8_decode(data.tobytes()) |
|||
| msg8625 (view) | Author: Jim Baker (zyasoft) | Date: 2014年06月10日.20:57:38 | |
Target beta 4 |
|||
| msg8633 (view) | Author: Jeff Allen (jeff.allen) | Date: 2014年06月12日.19:20:08 | |
I'd happily take this on unless someone is itching to get to know the buffer interface better. |
|||
| msg8638 (view) | Author: Santoso Wijaya (santa4nt) | Date: 2014年06月13日.18:04:19 | |
Sounds interesting to me. Any tips? |
|||
| msg8639 (view) | Author: Jeff Allen (jeff.allen) | Date: 2014年06月13日.20:38:07 | |
I decided step 1 was to make PyBuffer extend AutoCloseable, because this work by Indra Talip would have been neater: http://hg.python.org/jython/rev/355bb70327e0 Been meaning to since Java 7. So I've done that (testing now, maybe push tonight). You can take over from there if you like. This article is about the buffer protocol: https://wiki.python.org/jython/BufferProtocol , but it needs to be updated with the change I just made. If you look into how some choice codecs work, at the bottom they all seem to depend on entry points in modules/_codecs.java, so it's those that need changing. For a start, accept a PyObject obytes argument, then something like: if (obytes instanceof BufferProtocol) { try (PyBuffer bytes = ((BufferProtocol)obytes).getBuffer(PyBUF.SIMPLE)) { ... } } else { throw Py.TypeError("must be string or buffer, not " ... ) } You should then find the existing code bytes.charAt() still works, or it might be better to say this stuff really is bytes now. The soft option is ask for it as a String again, but IMO that's perpetuating a misdemeanor. My worry was that a lot of helper methods, and maybe some clients of these methods, would have to change signature, so it would end up really quite extensive. Maybe they should anyway. I couldn't find a test that exposes this problem, so I was going to add to test_codecs_jy.py, something like: def round_trip(u, name) : s = u.encode(name) dec = codecs.getdecoder(name) for B in (buffer, memoryview, bytearray) : self.assertEqual(u, dec(B(s))[0]) (I think that's correct.) Then call it with a variety of unicode strings and codec names. |
|||
| msg8642 (view) | Author: Jeff Allen (jeff.allen) | Date: 2014年06月14日.14:40:54 | |
Ok, I committed the helpful change to PyBuffer and made the Wiki change. |
|||
| msg8688 (view) | Author: Jim Baker (zyasoft) | Date: 2014年06月19日.00:34:58 | |
Jeff, thanks, sounds like a reasonable set of changes that we need to propagate through the codecs implementation. |
|||
| msg9006 (view) | Author: Jim Baker (zyasoft) | Date: 2014年09月18日.02:33:02 | |
Target beta 4 |
|||
| msg11812 (view) | Author: Jeff Allen (jeff.allen) | Date: 2018年03月16日.22:54:50 | |
Guess I'll take it on then. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2019年07月21日 07:25:12 | jeff.allen | link | issue2788 dependencies |
| 2018年03月16日 22:54:51 | jeff.allen | set | priority: normal assignee: jeff.allen messages: + msg11812 |
| 2014年09月18日 02:33:02 | zyasoft | set | resolution: remind messages: + msg9006 |
| 2014年06月19日 00:34:58 | zyasoft | set | messages: + msg8688 |
| 2014年06月14日 14:40:54 | jeff.allen | set | messages: + msg8642 |
| 2014年06月13日 20:38:08 | jeff.allen | set | messages: + msg8639 |
| 2014年06月13日 18:04:19 | santa4nt | set | messages: + msg8638 |
| 2014年06月12日 19:20:08 | jeff.allen | set | nosy:
+ jeff.allen messages: + msg8633 |
| 2014年06月11日 01:51:44 | santa4nt | set | type: behaviour |
| 2014年06月11日 01:51:37 | santa4nt | set | nosy:
+ santa4nt components: + Core versions: + Jython 2.7 |
| 2014年06月10日 20:57:39 | zyasoft | set | messages: + msg8625 |
| 2014年06月10日 20:57:28 | zyasoft | create | |
Supported by Python Software Foundation,
Powered by Roundup