Issue 13538: Improve doc for str(bytesobject)

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57747

classification

Title:	Improve doc for str(bytesobject)
Type:	enhancement	Stage:	resolved
Components:	Documentation	Versions:	Python 3.2, Python 3.3, Python 3.4

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	docs@python	Nosy List:	Guillaume.Bouchard, chris.jerdonek, docs@python, eric.araujo, ezio.melotti, pitrou, python-dev, r.david.murray, terry.reedy
Priority:	normal	Keywords:	easy, patch

Created on 2011年12月06日 12:56 by Guillaume.Bouchard, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue-13538-1-default.patch	chris.jerdonek, 2012年10月13日 21:14
issue-13538-2-default.patch	chris.jerdonek, 2012年10月16日 07:11	review
issue-13538-3-default.patch	chris.jerdonek, 2012年10月16日 07:22
issue-13538-5-default.patch	chris.jerdonek, 2012年11月10日 18:06	review
issue-13538-6-default.patch	chris.jerdonek, 2012年11月19日 08:26	review
issue-13538-7-default.patch	chris.jerdonek, 2012年11月20日 02:16	review

Messages (24)
msg148914 - (view)	Author: Guillaume Bouchard (Guillaume.Bouchard)	Date: 2011年12月06日 12:56
The docstring associated with str() says: str(string[, encoding[, errors]]) -> str Create a new string object from the given encoded string. encoding defaults to the current default string encoding. errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'. When it is stated in the on-line documentation:: When only object is given, this returns its nicely printable representation. My issue comes when I tried to convert bytes to str. As stated in the documentation, and to avoid implicit behavior, converting str to bytes cannot be done without giving an encoding (using bytes(my_str, encoding=..) or my_str.encode(...). bytes(my_str) will raise a TypeError). But if you try to convert bytes to str using str(my_bytes), python will returns you the so-called nicely printable representation of the bytes object). ie. :: >>> bytes("foo") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: string argument without an encoding >>> str(b"foo") "b'foo'" As a matter of coherency and to avoid silent errors, I suggest that str() of a byte object without encoding raise an exception. I think it is usually what people want. If one wants a nicely printable representation of their bytes object, they can call explicitly the repr() function and will quickly see that what they just printed is wrong. But if they want to convert a byte object to its unicode representation, they will prefer an exception rather than a silently failing converting which leads to an unicode string starting with 'b"' and ending with '"'.
msg148916 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2011年12月06日 13:12
I agree with you that this is inconsistent. However, having str raise an error is pretty much a non-starter as a suggestion. str always falls back to the repr; in general str(obj) should always return some value, otherwise the assumptions of a lot of Python code would be broken. Personally I'm not at all sure why str takes encoding and errors arguments (I never use them). I'd rather there be only one way to do that, decode. In other words, why do we have special case support for byte strings in the str conversion function? But I don't think that can be changed either, so I think we are stuck with documenting the existing situation better. Do you want to propose a doc patch?
msg148917 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2011年12月06日 13:14
> Personally I'm not at all sure why str takes encoding and errors > arguments (I never use them). Probably because the unicode type also did in 2.x. And also because it makes it compatible with arbitrary buffer objects: >>> str(memoryview(b"foo"), "ascii") 'foo'
msg148918 - (view)	Author: Guillaume Bouchard (Guillaume.Bouchard)	Date: 2011年12月06日 13:56
> str always falls back to the repr; in general str(obj) should always return some value, otherwise the assumptions of a lot of Python code would be broken. Perhaps it may raises a warning ? ie, the only reason encoding exists if for the conversion of bytes (or something which looks like bytes) to str. Do you think it may be possible to special case the use of str for bytes (and bytesarray) with something like this: def str(object, encoding=None, errors=None): if encoding is not None: # usual work else: if isinstance(object, (bytes, bytesarray)): warning('Converting bytes/bytesarray to str without encoding, it may not be what you expect') return object.__str__() But by the way, adding warnings and special case everywhere seems not too pythonic. > Do you want to propose a doc patch? The docstring for str() should looks like something like, in my frenglish way of writing english :: Create a new string object from the given encoded string. If object is bytes, bytesarray or a buffer-like object, encoding and error can be set. errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'. WARNING, if encoding is not set, the object is converted to a nicely printable representation, which is totally different from what you may expect. Perhaps a warning may be added in the on-line documentation, such as :: .. warning:: When str() converts a bytes/bytesarray or a buffer-like object and encoding is not specified, the result will an unicode nicely printable representation, which is totally different from the unicode representation of you object using a specified encoding. Whould you like a .diff on top of the current mercurial repository ?
msg148919 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2011年12月06日 14:30
A diff would be great. We try to use warnings sparingly, and I don't think this is a case that warrants it. Possibly a .. note is worthwhile, perhaps with an example for the bytes case, but even that may be too much. I also wouldn't use the wording "is totally different from what you would expect", since by now I do expect it :). How about something like "the result will not be the decoded version of the bytes, but instead will be the repr of the object", with a cross link to repr.
msg148922 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2011年12月06日 15:00
Well, I forgot to mention it in my previous message, but there is already a warning that you can activate with the -b option: $ ./python -b Python 3.3.0a0 (default:6b6c79eba944, Dec 6 2011, 11:11:32) [GCC 4.5.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> str(b"") __main__:1: BytesWarning: str() on a bytes instance "b''" And you can even turn it into an error with -bb: $ ./python -bb Python 3.3.0a0 (default:6b6c79eba944, Dec 6 2011, 11:11:32) [GCC 4.5.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> str(b"") Traceback (most recent call last): File "<stdin>", line 1, in <module> BytesWarning: str() on a bytes instance However, -b is highly unlikely to become the default, for the reasons already explained. It was mainly meant to ease porting from Python 2.
msg149163 - (view)	Author: Éric Araujo (eric.araujo) * (Python committer)	Date: 2011年12月10日 16:02
A note in the docs (without note/warning directives, just a note) and maybe the docstring would be good. It should better explain that str has two uses: converting anything to a str (using __str__ or __repr__), decode buffer to str (with encoding and errors arguments). str(b'') is a case of the first use, not the second (and likewise %s formatting).
msg149362 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2011年12月12日 22:41
I think Eric's suggestion is the proper approach.
msg172716 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年10月12日 03:17
This may have been addressed to some extent by issue 14783: http://hg.python.org/cpython/rev/3773c98d9da8
msg172832 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年10月13日 21:14
Attaching a proposed patch along the lines suggested by Éric.
msg172960 - (view)	Author: Ezio Melotti (ezio.melotti) * (Python committer)	Date: 2012年10月15日 11:05
Instead of documenting what encoding and errors do, I would just say that str(bytesobj, encoding, errors) is equivalent to bytesobj.decode(encoding, errors) (assuming it really is). I don't like encodings/decodings done via the str/bytes constructors, and I think the docs should encourage the use of bytes.decode/str.encode.
msg172989 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年10月15日 17:27
> I would just say that str(bytesobj, encoding, errors) is equivalent to bytesobj.decode(encoding, errors) (assuming it really is). Good suggestion. And yes, code is shared in the following way: http://hg.python.org/cpython/file/d3c7ebdc71bb/Objects/bytesobject.c#l2306 One thing that would need to be addressed in the str() version is if bytesobj is a PEP 3118 character buffer, after which it falls back to bytesobj.decode(encoding, errors). I will update the patch so people can see how it looks.
msg172990 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年10月15日 17:29
Indeed: >>> m = memoryview(b"") >>> str(m, "utf-8") '' >>> m.decode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'memoryview' object has no attribute 'decode'
msg172991 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年10月15日 17:30
Note: "character buffer" isn't a term we use anymore (in Python 3, that is).
msg173018 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年10月16日 07:11
Attaching updated patch based on Ezio and Antoine's comments. Let me know if I'm not using the correct or preferred terminology around buffer objects and the buffer protocol. It doesn't seem like the section on the buffer protocol actually says what objects implementing the buffer protocol should be called. I gather indirectly from the docs that such objects are called "buffer objects" (as opposed to just "buffers"): http://docs.python.org/dev/c-api/buffer.html#bufferobjects
msg173019 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年10月16日 07:22
Reattaching patch (a line was missing).
msg173023 - (view)	Author: Ezio Melotti (ezio.melotti) * (Python committer)	Date: 2012年10月16日 09:58
+ str(bytes, encoding[, errors='strict']) + str(bytes, errors[, encoding='utf-8']) Why not simply str(bytes, encoding='utf-8', errors='strict')? (Your signature suggests that str(b'abc', 'strict') should work.) + the string itself. This behavior differs from :func:`repr` in that the I'm not sure this is the right place where to explain the differences between __str__ and __repr__ (or maybe it is?). Also doesn't str() falls back on __repr__ if __str__ is missing? Does :meth:`__str__` link to object.__str__? + If encoding or errors is given, and/or + (or :class:`bytearray`) object, then :func:`str` calls I would use 'is equivalent to', rather than 'calls'. + :meth:`bytes.decode(encoding, errors) <bytes.decode>` on the object + and returns the value. Otherwise, the bytes object underlying the buffer + object is obtained before calling :meth:`bytes.decode() <bytes.decode>`. :meth:`bytes.decode` should be enough. + Passing a :func:`bytes <bytes>` :func:`bytes` should be enough (if it isn't, maybe you want :func:`.bytes`).
msg175262 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年11月10日 03:35
New patch incorporating Ezio's suggestions, along with some other changes.
msg175950 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年11月19日 08:26
Updating patch after Ezio's review on Rietveld.
msg175976 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年11月20日 02:16
Attaching new patch to address Ezio's further comments (for the convenience of comparing in Rietveld). I will be committing this.
msg175977 - (view)	Author: Éric Araujo (eric.araujo) * (Python committer)	Date: 2012年11月20日 04:13
I left a few remarks. The patch is very nice, thanks!
msg175978 - (view)	Author: Chris Jerdonek (chris.jerdonek) * (Python committer)	Date: 2012年11月20日 04:44
Thanks, Éric! (And thanks also to Ezio who helped quite a bit with the improvements.) I replied to your comments on Rietveld.
msg176039 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年11月21日 01:56
New changeset f32f1cb508ad by Chris Jerdonek in branch '3.2': Improve str() and object.__str__() documentation (issue #13538). http://hg.python.org/cpython/rev/f32f1cb508ad New changeset 6630a1c42204 by Chris Jerdonek in branch '3.3': Null merge from 3.2 (issue #13538). http://hg.python.org/cpython/rev/6630a1c42204 New changeset 325f80d792b9 by Chris Jerdonek in branch '3.3': Improve str() and object.__str__() documentation (issue #13538). http://hg.python.org/cpython/rev/325f80d792b9 New changeset 59acd5cac8b5 by Chris Jerdonek in branch 'default': Merge from 3.3: Improve str() and object.__str__() docs (issue #13538). http://hg.python.org/cpython/rev/59acd5cac8b5
msg176058 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年11月21日 13:38
New changeset 5c39e3906ce9 by Chris Jerdonek in branch '3.2': Fix label in docs (from issue #13538). http://hg.python.org/cpython/rev/5c39e3906ce9

History
Date	User	Action	Args
2022年04月11日 14:57:24	admin	set	github: 57747
2012年11月21日 13:38:32	python-dev	set	messages: + msg176058
2012年11月21日 02:17:59	chris.jerdonek	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2012年11月21日 01:56:13	python-dev	set	nosy: + python-dev messages: + msg176039
2012年11月20日 04:44:36	chris.jerdonek	set	messages: + msg175978
2012年11月20日 04:13:17	eric.araujo	set	messages: + msg175977
2012年11月20日 02:16:14	chris.jerdonek	set	files: + issue-13538-7-default.patch messages: + msg175976
2012年11月19日 08:27:03	chris.jerdonek	set	files: + issue-13538-6-default.patch messages: + msg175950
2012年11月10日 18:13:28	chris.jerdonek	set	files: - issue-13538-4-default.patch
2012年11月10日 18:06:41	chris.jerdonek	set	files: + issue-13538-5-default.patch
2012年11月10日 03:35:19	chris.jerdonek	set	files: + issue-13538-4-default.patch messages: + msg175262
2012年10月16日 09:58:25	ezio.melotti	set	messages: + msg173023
2012年10月16日 07:22:42	chris.jerdonek	set	files: + issue-13538-3-default.patch messages: + msg173019
2012年10月16日 07:11:07	chris.jerdonek	set	files: + issue-13538-2-default.patch messages: + msg173018 stage: needs patch -> patch review
2012年10月15日 17:30:35	pitrou	set	messages: + msg172991
2012年10月15日 17:29:56	pitrou	set	messages: + msg172990
2012年10月15日 17:27:03	chris.jerdonek	set	messages: + msg172989
2012年10月15日 11:05:10	ezio.melotti	set	messages: + msg172960
2012年10月13日 21:14:30	chris.jerdonek	set	files: + issue-13538-1-default.patch keywords: + patch messages: + msg172832 versions: + Python 3.4
2012年10月12日 03:17:48	chris.jerdonek	set	nosy: + chris.jerdonek messages: + msg172716
2012年07月25日 22:58:30	ezio.melotti	set	keywords: + easy type: enhancement stage: needs patch
2011年12月12日 22:42:07	ezio.melotti	set	nosy: + ezio.melotti
2011年12月12日 22:41:13	terry.reedy	set	nosy: + terry.reedy messages: + msg149362
2011年12月10日 16:03:53	eric.araujo	set	title: Docstring of str() and/or behavior -> Improve doc for str(bytesobject)
2011年12月10日 16:02:26	eric.araujo	set	nosy: + eric.araujo messages: + msg149163
2011年12月06日 15:00:18	pitrou	set	messages: + msg148922
2011年12月06日 14:30:26	r.david.murray	set	messages: + msg148919
2011年12月06日 13:56:40	Guillaume.Bouchard	set	messages: + msg148918
2011年12月06日 13:14:31	pitrou	set	nosy: + pitrou messages: + msg148917
2011年12月06日 13:12:16	r.david.murray	set	versions: + Python 3.3 nosy: + docs@python, r.david.murray messages: + msg148916 assignee: docs@python components: + Documentation, - Interpreter Core
2011年12月06日 12:56:42	Guillaume.Bouchard	create

homepage