Jython

Issue2164

classification
Title: codecs do not accept memoryview objects for decoding
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.7
Milestone:
process
Status: open Resolution: remind
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: jeff.allen, santa4nt, zyasoft
Priority: normal Keywords:

Created on 2014年06月10日.20:57:28 by zyasoft, last changed 2018年03月16日.22:54:51 by jeff.allen.

Messages
msg8624 (view) Author: Jim Baker (zyasoft) Date: 2014年06月10日.20:57:27
Difference between CPython and Jython seen with this example:
# -*- coding: utf-8 -*-
import codecs
data = memoryview(b"中文")
text, decoded_bytes = codecs.utf_8_decode(data)
assert text == u"中文"
assert type(text) is unicode
assert decoded_bytes == 6
This works fine on CPython. On Jython, it fails with TypeError: utf_8_decode(): 1st arg can't be coerced to String
Current workaround is to use tobytes on the memoryview object:
text, decoded_bytes = codecs.utf_8_decode(data.tobytes())
msg8625 (view) Author: Jim Baker (zyasoft) Date: 2014年06月10日.20:57:38
Target beta 4
msg8633 (view) Author: Jeff Allen (jeff.allen) Date: 2014年06月12日.19:20:08
I'd happily take this on unless someone is itching to get to know the buffer interface better.
msg8638 (view) Author: Santoso Wijaya (santa4nt) Date: 2014年06月13日.18:04:19
Sounds interesting to me. Any tips?
msg8639 (view) Author: Jeff Allen (jeff.allen) Date: 2014年06月13日.20:38:07
I decided step 1 was to make PyBuffer extend AutoCloseable, because this work by Indra Talip would have been neater:
http://hg.python.org/jython/rev/355bb70327e0
Been meaning to since Java 7. So I've done that (testing now, maybe push tonight). You can take over from there if you like.
This article is about the buffer protocol: https://wiki.python.org/jython/BufferProtocol , but it needs to be updated with the change I just made.
If you look into how some choice codecs work, at the bottom they all seem to depend on entry points in modules/_codecs.java, so it's those that need changing. For a start, accept a PyObject obytes argument, then something like:
if (obytes instanceof BufferProtocol) {
 try (PyBuffer bytes = ((BufferProtocol)obytes).getBuffer(PyBUF.SIMPLE)) {
 ...
 }
} else {
 throw Py.TypeError("must be string or buffer, not " ... )
}
You should then find the existing code bytes.charAt() still works, or it might be better to say this stuff really is bytes now. The soft option is ask for it as a String again, but IMO that's perpetuating a misdemeanor.
My worry was that a lot of helper methods, and maybe some clients of these methods, would have to change signature, so it would end up really quite extensive. Maybe they should anyway.
I couldn't find a test that exposes this problem, so I was going to add to test_codecs_jy.py, something like:
def round_trip(u, name) :
 s = u.encode(name)
 dec = codecs.getdecoder(name)
 for B in (buffer, memoryview, bytearray) :
 self.assertEqual(u, dec(B(s))[0])
(I think that's correct.) Then call it with a variety of unicode strings and codec names.
msg8642 (view) Author: Jeff Allen (jeff.allen) Date: 2014年06月14日.14:40:54
Ok, I committed the helpful change to PyBuffer and made the Wiki change.
msg8688 (view) Author: Jim Baker (zyasoft) Date: 2014年06月19日.00:34:58
Jeff, thanks, sounds like a reasonable set of changes that we need to propagate through the codecs implementation.
msg9006 (view) Author: Jim Baker (zyasoft) Date: 2014年09月18日.02:33:02
Target beta 4
msg11812 (view) Author: Jeff Allen (jeff.allen) Date: 2018年03月16日.22:54:50
Guess I'll take it on then.
History
Date User Action Args
2019年07月21日 07:25:12jeff.allenlinkissue2788 dependencies
2018年03月16日 22:54:51jeff.allensetpriority: normal
assignee: jeff.allen
messages: + msg11812
2014年09月18日 02:33:02zyasoftsetresolution: remind
messages: + msg9006
2014年06月19日 00:34:58zyasoftsetmessages: + msg8688
2014年06月14日 14:40:54jeff.allensetmessages: + msg8642
2014年06月13日 20:38:08jeff.allensetmessages: + msg8639
2014年06月13日 18:04:19santa4ntsetmessages: + msg8638
2014年06月12日 19:20:08jeff.allensetnosy: + jeff.allen
messages: + msg8633
2014年06月11日 01:51:44santa4ntsettype: behaviour
2014年06月11日 01:51:37santa4ntsetnosy: + santa4nt
components: + Core
versions: + Jython 2.7
2014年06月10日 20:57:39zyasoftsetmessages: + msg8625
2014年06月10日 20:57:28zyasoftcreate

Supported by Python Software Foundation,
Powered by Roundup

AltStyle によって変換されたページ (->オリジナル) /