Issue 4874: decoding functions in _codecs module accept str arguments

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49124

classification

Title:	decoding functions in _codecs module accept str arguments
Type:	behavior	Stage:
Components:	Extension Modules	Versions:	Python 3.0, Python 3.1

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	pitrou	Nosy List:	amaury.forgeotdarc, benjamin.peterson, lemburg, pitrou, vstinner
Priority:	release blocker	Keywords:	patch

Created on 2009年01月07日 23:48 by pitrou, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
_codecs_bytes-2.patch	vstinner, 2009年01月17日 01:31
mbdecode-unicode.patch	pitrou, 2009年01月22日 11:04

Messages (12)
msg79384 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年01月07日 23:48
The following function calls should raise a TypeError instead. Encoding functions are fine (they only accept str). >>> import codecs >>> codecs.utf_8_decode('aa') ('aa', 2) >>> codecs.utf_8_decode('éé') ('éé', 4) >>> codecs.latin_1_decode('éé') ('Ã©Ã©', 4)
msg79387 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2009年01月08日 00:59
Patch replacing "s" parsing format by "y" for: - utf_7_decode() - utf_8_decode() - utf_16_decode() - utf_16_le_decode() - utf_16_be_decode() - utf_16_ex_decode() - utf_32_decode() - utf_32_le_decode() - utf_32_be_decode() - utf_32_ex_decode() - unicode_escape_decode() - raw_unicode_escape_decode() - latin_1_decode() - ascii_decode() - charmap_decode() - mbcs_decode() Using run_tests.sh, all tests are ok (with 19 skipped tests). I guess that there is not tests for all these functions :-/ Note: codecs documentation was already correct: .. method:: Codec.decode(input[, errors]) (...) input must be a bytes object or one which provides the read-only character buffer interface -- for example, buffer objects and memory mapped files.
msg79402 - (view)	Author: Marc-Andre Lemburg (lemburg) * (Python committer)	Date: 2009年01月08日 09:29
On 2009年01月08日 01:59, STINNER Victor wrote: > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Patch replacing "s" parsing format by "y" for: > - utf_7_decode() > - utf_8_decode() > - utf_16_decode() > - utf_16_le_decode() > - utf_16_be_decode() > - utf_16_ex_decode() > - utf_32_decode() > - utf_32_le_decode() > - utf_32_be_decode() > - utf_32_ex_decode() > - latin_1_decode() > - ascii_decode() > - charmap_decode() > - mbcs_decode() These are fine. > - unicode_escape_decode() > - raw_unicode_escape_decode() These changes are in line with their C API codec interfaces as well, but those particular codecs could well also be made to work on Unicode input, since unescaping can well be applied to Unicode as well. I'll probably open a new item for this. > Using run_tests.sh, all tests are ok (with 19 skipped tests). I guess > that there is not tests for all these functions :-/ The mbcs codec is only available on Windows. All others are tested by test_codecs.py. Which ones are skipped in your case ? > Note: codecs documentation was already correct: > > .. method:: Codec.decode(input[, errors]) > (...) > input must be a bytes object or one which provides the read-only > character > buffer interface -- for example, buffer objects and memory mapped > files. That's not entirely correct: codecs are allowed to accept any object type and can also return any object type. It up to them to decide, e.g. a codec may accept both bytes and Unicode input and always generate Unicode output when decoding. I guess I have to review those doc changes...
msg79741 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年01月13日 13:14
The patch is probably fine, but it would be nice to add some unit tests for the new behaviour.
msg79843 - (view)	Author: Marc-Andre Lemburg (lemburg) * (Python committer)	Date: 2009年01月14日 08:40
On 2009年01月13日 14:14, Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > The patch is probably fine, but it would be nice to add some unit tests > for the new behaviour. +1 from my side. Thanks for the patch, Viktor.
msg79991 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2009年01月17日 01:31
New patch: - Leave unicode_escape_decode() and raw_unicode_escape_decode() unchanged (still accept unicode string) - Test changes (reject unicode for most codecs decode functions) - Write tests for unicode_escape_decode() and raw_unicode_escape_decode() (there was no test for these functions)
msg80362 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年01月22日 10:32
Committed in r68855, r68856. Thanks!
msg80363 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2009年01月22日 10:46
IMO, Modules/cjkcodecs/multibytecodec.c should be changed as well.
msg80364 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年01月22日 10:55
Right, I hadn't thought of that.
msg80365 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年01月22日 11:04
"Fixing" multibytecodec.c produces a TypeError in the following test: def test_errorcallback_longindex(self): dec = codecs.getdecoder('euc-kr') myreplace = lambda exc: ('', sys.maxsize+1) codecs.register_error('test.cjktest', myreplace) self.assertRaises(IndexError, dec, 'apple\x92ham\x93spam', 'test.cjktest') TypeError: decode() argument 1 must be bytes or buffer, not str Since the test is meant to test recovery from a misbehaving callback, I guess the type of the input string is not really important and can be changed to a bytes string instead. What do you think? (in any case, here is a patch)
msg80366 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2009年01月22日 11:56
The patch looks good. I think the missing b in test_errorcallback_longindex is an overlook when the tests were updated for py3k. You are right to change the test.
msg80367 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年01月22日 12:25
Committed in r68857, r68858.

History
Date	User	Action	Args
2022年04月11日 14:56:43	admin	set	nosy: + benjamin.peterson github: 49124
2009年01月22日 12:25:56	pitrou	set	status: open -> closed resolution: fixed messages: + msg80367
2009年01月22日 11:56:18	amaury.forgeotdarc	set	messages: + msg80366
2009年01月22日 11:04:50	pitrou	set	files: + mbdecode-unicode.patch messages: + msg80365
2009年01月22日 10:55:57	pitrou	set	status: closed -> open resolution: fixed -> (no value) messages: + msg80364
2009年01月22日 10:46:11	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg80363
2009年01月22日 10:32:07	pitrou	set	status: open -> closed resolution: accepted -> fixed messages: + msg80362
2009年01月22日 10:10:20	pitrou	set	assignee: pitrou resolution: accepted
2009年01月17日 01:31:48	vstinner	set	files: - _codecs_bytes.patch
2009年01月17日 01:31:42	vstinner	set	files: + _codecs_bytes-2.patch messages: + msg79991
2009年01月14日 08:40:04	lemburg	set	messages: + msg79843
2009年01月13日 13:14:19	pitrou	set	messages: + msg79741
2009年01月08日 09:29:20	lemburg	set	nosy: + lemburg messages: + msg79402
2009年01月08日 00:59:08	vstinner	set	files: + _codecs_bytes.patch keywords: + patch messages: + msg79387 nosy: + vstinner
2009年01月07日 23:48:15	pitrou	create

homepage