homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: http.client request and send method have some datatype issues
Type: enhancement Stage: needs patch
Components: Versions: Python 3.5
process
Status: open Resolution:
Dependencies: 27340 Superseder:
Assigned To: Nosy List: Alex.Willmer, Mariatta, axitkhurana, demian.brecht, martin.panter, r.david.murray, serhiy.storchaka
Priority: normal Keywords: easy

Created on 2015年03月22日 19:28 by r.david.murray, last changed 2022年04月11日 14:58 by admin.

Messages (12)
msg238928 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015年03月22日 19:28
While committing the patch for issue 23539 I decided to rewrite the 'request' docs for clarity. I doing so I found that http.client isn't as consistent as it could be about how it handles bytes and strings. Two points specifically: it will only take the length of a bytes-like object (to supply a default Content-Length header) if isinstance(x, bytes) is true (that is, it doesn't take the length of eg array or memoryview objects), and (2) if an iterable is passed in, it must be an iterable of bytes-like objects. Since it already automatically encodes string objects and text files, for consistency it should probably also encode strings if they are passed in via an iterator.
msg238935 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月22日 20:18
See also issue23350 and issue23360.
msg238952 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年03月22日 23:24
Summary of the main supported types as I see them, whether documented, undocumented, or only working by accident:
* None
* Bytes-like sequences, e.g. bytes(), bytearray. I believe Content-Length is actually automatically set for all these types.
* Arbitrary bytes-like objects, including array.array("I") and ctypes structures, if custom Content-Length provided (HTTP over TCP only)
* str() objects, to be automatically encoded with Latin-1
* File reader objects not supporting stat(), e.g. BytesIO
* File with valid stat().st_size, i.e. not named pipes
* Binary file reader
* Text file reader with mode attribute without a "b", automatically encoded with Latin-1
* Any iterable of bytes-like sequences
* Any iterable of arbitrary bytes-like objects (HTTP over TCP only)
Arbitrary bytes-like objects are not properly supported by SSLSocket.sendall(). After all, the non-SSL socket.sendall() documentation does not explicitly mention supporting arbitrary bytes-like objects either, though it does seem to support them in practice.
Suggested documentation fixes and additions:
* Clarify Content-Length is added for all sequence objects, not just "string or bytes objects"
* Warn that a custom Content-Length should be provided with non-bytes sequences, and with special files like named pipes, since the len() or stat() call will give the wrong value
* end_headers() should mention byte string rather than just "string", since it does not (yet) accept Latin-1 text strings
* end_headers() should not mention sending in the same packet either, since separate sendall() calls are made for all cases; see Issue 23302
* send() should clarify what it accepts
* Either omit mentioning arbitrary bytes-like objects, or add support to SSLSocket.sendall() and document support by non-SSL socket.sendall()
I would support encoding iterables of str() objects for consistency. The patch for Issue 23350 already does this, although I am starting to question the wisdom of special-casing lists and tuples in that patch.
msg238960 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年03月23日 00:06
As well as encoding iterables of str(), text files could also be handled more consistently by checking the read() return type. That would eliminate the complication of checking for a "b" mode.
msg238964 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015年03月23日 00:50
FWIW, I've done some additional work to request/send in issue #12319 where I've added support for chunked request encoding.
@David
> if an iterable is passed in, it must be an iterable of bytes-like objects
This specific issue is addressed in the patch in #23350.
@Martin
> text files could also be handled more consistently by checking the read() return type
I wouldn't be opposed to this at all. In fact, I was going to initially make that change in #12319, but wanted to keep the change surface minimal and realistically, peeking at the data type rather than checking for 'b' in mode doesn't /really/ make that much of a difference.
msg238969 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015年03月23日 01:50
Yeah, if we're going to check the type for iterables to convert strings, we might as well check the type for read() as well.
The bit about the len not being set except for str and bytes was me mis-remembering what I read in the code. (The isinstance check is about whether _send_output sends the body in the same packet.)
msg239023 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月23日 14:02
Note that for file-like objects we have also the same issue as issue22468. Content-Length is not determined correctly for GzipFile and like.
msg241190 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年04月16日 02:04
The priority of file-like objects versus bytes-like and iterable objects should be well defined. See Issue 5038: mmap objects seem to satisfy all three interfaces, but the result may be different depending on the file position.
msg272118 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016年08月07日 12:37
I’ve decided I would prefer deprecating the Latin-1 encoding, rather than adding more encoding support for iterables and text files. If not deprecating it altogether, at least prefer just ASCII encoding (like Python 2). The urlopen() function already rejects text str, and in Issue 26045 people often want or expect UTF-8.
Here is a summary I made of the different data types handled for the body object, from highest priority to lowest priority.
 HTTPConnection HTTPConn. default 
body _send_output() Content-Length urlopen() 
=========== =============== ================= ==================
None No body 0 or unset #23539 C-L unset 
Has read() read() #1065257 * #5038 
- Text file encode() #5314 
str encode() r56128 len() TypeError #11082 
Byte seq. sendall() len() C-L: nbytes 
Bytes-like * #27340 * C-L: nbytes #3243 
Iterable iter() #3243 C-L required #3243
Sized len()† #23350 
fstat() st_size #1065257 
Otherwise TypeError Unset #1065257 C-L optional† 
* Possible bugs
† Degenerate cases not worth supporting IMO
C-L = Content-Length header field, set by default or required to be specified in various cases
Byte seq. = Sequence of bytes, including "bytes" type, bytearray, array("B"), etc
Bytes-like = Any C-contiguous buffer, including zero- and multi-dimensional arrays, and with items other than bytes
Iterable = Iterable of bytes or bytes-like objects (depending on Issue 27340 about SSL)
msg387967 - (view) Author: Alex Willmer (Alex.Willmer) * Date: 2021年03月02日 20:45
A data point found while I researched this
MyPy typeshed [1] currently declares
_DataType = Union[bytes, IO[Any], Iterable[bytes], str]
class HTTPConnection:
 def send(self, data: _DataType) -> None: ...
[1] https://github.com/python/typeshed/blob/e2967a8beee9e079963ea91a67087ba8fded1d0b/stdlib/http/client.pyi#L26 
msg387975 - (view) Author: Alex Willmer (Alex.Willmer) * Date: 2021年03月02日 21:36
First stab at characterising http.client.HTTPConnnection.send().
https://github.com/moreati/bpo-23740
This uses a webserver that returns request details, in the body of the response. Raw (TCP level) content is included. It shares a similar purpose to HTTP TRACE command. In principal the bytes that HTTPConection.send() writes will match to the bytes returned (after they're decapsulated from the JSON). I've not tested that aspect yet.
TODO
- further testing (verify round trip, bytes in = bytes out) 
- cover multiple Python versions
- cover cases such client manually setting Content-Length
msg388165 - (view) Author: Alex Willmer (Alex.Willmer) * Date: 2021年03月05日 21:15
http_dump.py now covers CPython 3.6-3.10 (via Tox), and HTTPSConnection https://github.com/moreati/bpo-23740 
History
Date User Action Args
2022年04月11日 14:58:14adminsetgithub: 67928
2021年03月05日 21:15:27Alex.Willmersetmessages: + msg388165
2021年03月02日 21:36:39Alex.Willmersetmessages: + msg387975
2021年03月02日 20:45:20Alex.Willmersetnosy: + Alex.Willmer
messages: + msg387967
2016年09月29日 22:22:44Mariattasetnosy: + Mariatta
2016年08月07日 12:37:51martin.pantersetmessages: + msg272118
2016年06月17日 04:25:59martin.pantersetdependencies: + bytes-like objects with socket.sendall(), SSL, and http.client
2015年09月11日 02:13:11martin.panterlinkissue13559 dependencies
2015年04月18日 23:45:53axitkhuranasetnosy: + axitkhurana
2015年04月16日 02:04:15martin.pantersetmessages: + msg241190
2015年03月23日 14:02:59serhiy.storchakasetmessages: + msg239023
2015年03月23日 01:50:15r.david.murraysetmessages: + msg238969
2015年03月23日 00:50:30demian.brechtsetnosy: + demian.brecht
messages: + msg238964
2015年03月23日 00:06:39martin.pantersetmessages: + msg238960
2015年03月22日 23:24:20martin.pantersetnosy: + martin.panter
messages: + msg238952
2015年03月22日 20:18:44serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg238935
2015年03月22日 19:28:44r.david.murraycreate

AltStyle によって変換されたページ (->オリジナル) /