This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年01月31日 19:18 by William.Wu, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test_urllib_request.patch | William.Wu, 2011年01月31日 19:34 | unit test file | ||
| urllib_request.patch | William.Wu, 2011年01月31日 19:34 | the patch file of request.py | ||
| issue11082.diff | orsenthil, 2011年02月10日 16:11 | review | ||
| issue11082-2.diff | orsenthil, 2011年02月11日 04:59 | review | ||
| issue-11082-docs.patch | orsenthil, 2011年11月09日 00:10 | review | ||
| Messages (25) | |||
|---|---|---|---|
| msg127646 - (view) | Author: William Wu (William.Wu) | Date: 2011年01月31日 19:18 | |
I found this bug when I started to trying Python 3.2 release candidate 1. When using urllib.request.urlopen to handle HTTP POST, I got the error message: ValueError: Content-Length should be specified for iterable data of type <class 'str'> 'foo=bar' I'll attach the patch and test case. |
|||
| msg127725 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2011年02月02日 08:36 | |
Senthil, could this be a regression of the recent urllib transfer-encoding changes? |
|||
| msg127728 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年02月02日 09:13 | |
The POST data should be bytes. So in the attached test case, instead of
request = urllib.request.Request('http://does.not.matter', 'foo=bar')
it should be:
request = urllib.request.Request('http://does.not.matter', b'foo=bar')
And the Content-Length will be calculated using this logic.
mv = memoryview(data)
Content-length = len(mv) * mv.itemsize
Should we emphasize further that data should be bytes? I think error
message can be improved here. This issue is not a blocker.
|
|||
| msg127731 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年02月02日 10:42 | |
Since r70638, the http client encodes unicode to ISO-8859-1: << RFC 2616 says that iso-8859-1 is the default charset for HTTP entity bodies, but we encoded strings using ascii. See http://bugs.python.org/issue5314. Changed docs and code to use iso-8859-1. >> |
|||
| msg127744 - (view) | Author: William Wu (William.Wu) | Date: 2011年02月02日 17:08 | |
If the POST data should be bytes which I also think reasonable,
should urllib.parse.urlencode return bytes instead of str?
>>> urllib.parse.urlencode({'foo': 'bar'})
'foo=bar'
>>> urllib.parse.urlencode({b'foo': b'bar'})
'foo=bar'
|
|||
| msg127752 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2011年02月02日 19:21 | |
That would seem correct to me. |
|||
| msg127753 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2011年02月02日 19:25 | |
It is also a question whether to disallow str explicitly, instead of letting it go through the Iterable check. |
|||
| msg127896 - (view) | Author: William Wu (William.Wu) | Date: 2011年02月04日 14:47 | |
So, what's the decision to be taken? I'm willing to provide patches (if I need to), but I need to know *the reasonable behaviors*. :) |
|||
| msg127897 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年02月04日 15:07 | |
For this particular issue, I think, it is good idea to disallow str explicitly, instead of letting it go through the Iterable and raise a ValueError with a different message. |
|||
| msg127904 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2011年02月04日 16:33 | |
Then let us do that. Senthil, what about urlencode of bytes values returning a str? |
|||
| msg128295 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年02月10日 15:23 | |
On Sat, Feb 5, 2011 at 12:33 AM, Georg Brandl <report@bugs.python.org> wrote: > > Then let us do that. > > Senthil, what about urlencode of bytes values returning a str? > Sorry, the late response. I needed some time to look at this one and I could only do it today. We have been having str return value from urlencode, due to quote, quote_plus methods which can take bytes/str but return str. (urlencode is nothing but quote_plus which acts on two-element sequences like name,value pairs. The resultant str is certainly useful when we want to construct a URL query string. But, when we want to do a POST via urlopen, we need bytes data. So, I think best to advice through the documentation that, urlencoded value must be converted to bytes if it is to be used for POST'ing using urlopen. (Well, I also think that urllencode could be modified to the effect that if the key,value pairs are bytes, return the bytes output. Technically, this would involve overriding some behavior of quote/quote_plus and the return value will suitable only for POST data and not for url query construction. I would rather choose to discuss this before settling on the behavior and I remember that quote's behavior did take a significant amount discussion). So, for this issue. A patch would a prevent str data from being posted with ValueError explaining the reason that bytes and required and a documentation update in the urlencode method which would inform the users that the output be converted to bytes before it is POSTed, Shall we go ahead with this, Georg? I shall post a patch immediately. |
|||
| msg128304 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年02月10日 16:11 | |
Here is a patch to resolve this issue. Let me know if it okay to commit.(Or feel free to commit too). |
|||
| msg128308 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年02月10日 16:59 | |
Georg Brandl: can this fix go into Python 3.2? It changes the API. I like any patch rejecting unicode where unicode is irrevelant (where we don't know how to choose the right encoding). About the patch: you should maybe add a test to ensure that str is rejected with a TypeError. |
|||
| msg128314 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年02月10日 17:06 | |
Victor - It does not change the API. Only that the ValueError message which had a confusing message that "for iterable data of type <class 'str'> " is made clear that POST should not be a str. Yes, a TypeError instead of ValueError would be more appropriate here. |
|||
| msg128321 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2011年02月10日 17:53 | |
Victor: I don't see an API change here. A ValueError is raised in both cases, and error messages are not part of the API. (And BTW, you may call me Georg.) Senthil: I'd like the documentation change to be a bit more explicit that a string is always returned. Since the string must be encoded, it would also be helpful to mention which encoding to use. The new exception message looks slightly incorrect to me: the argument can also be an iterable of bytes. |
|||
| msg128361 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年02月11日 04:05 | |
Here is the patch with addressing the comments. 1. It should be TypeError instead of ValueError when str is rejected. We introduced ValueError for this release only, so there is no breakage here. 2. Informed the user that string should be encoded to bytes using user-specified encoding before it is sent as POST data. Also pointed to Examples in the urllib.request section. ( I could not hyperlink directly to #examples section. If that is possible, it would be better). |
|||
| msg128362 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年02月11日 04:59 | |
patch with Docs :ref: to proper section. |
|||
| msg128367 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2011年02月11日 07:58 | |
I still find "user-specified encoding" unclear. This can be addressed at a different time though. Your exception message is missing a space between "bytes" and "or"; otherwise this is ok to commit. |
|||
| msg128380 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年02月11日 12:44 | |
Thanks. Committed in r88394. Regarding the encoding information. An explanation of this sort might be helpful. The string can be encoded using ISO-8859-1 which is the default encoding for POST data or the user can also encode using a custom encoding , in which case, Content-Type: header should specify the encoding value in addition to 'application/x-www-form-urlencoded' which is specified for POST data. For Example:, if one has to use utf-8 >>> sdata = urllib.parse.urlencode({"ሀ":"ሢ"}) >>> bytesdata = sdata.encode('utf-8') >>>reqheaders=dict([("Content-Type"," application/x-www-form-urlencoded ; charset=utf-8")]) >>>urllib.request.Request('http://www.example.com',bytesdata,reqheaders) |
|||
| msg128440 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2011年02月12日 07:33 | |
Lowering priority and making a doc issue now that the code change has been made. |
|||
| msg129547 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2011年02月26日 13:59 | |
BTW, there was no entry in Misc/NEWS, so this was not in whatsnew/3.2.rst. |
|||
| msg147324 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年11月09日 00:10 | |
Here is the docs patch which will help us close the issue. Addressing Eric's last comment - I believe the what's new and News for this issue was added with the feature, this one was Exception msg change. |
|||
| msg147330 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年11月09日 02:17 | |
I think the information in the patch should go in the urlopen doc, not in urlencode. Adding a note to urlencode that says that the result must be encoded is fine, but all the details about the default encoding and the fact that an extra Content-Type header is necessary when the encoding is not iso-8859-1 belong to the urlopen doc IMHO. (Is the header actually necessary? I think I always sent utf-8 without header, and the example you added in r88394 also uses utf-8 with no extra headers.) There are also a few minor issues with the patch -- I'll leave a note in rietveld for that. |
|||
| msg155984 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年03月16日 01:15 | |
New changeset 057cf78ed576 by Senthil Kumaran in branch '3.2': Explain the use of charset parameter with Content-Type header. Issue11082 http://hg.python.org/cpython/rev/057cf78ed576 New changeset 90e35b91756d by Senthil Kumaran in branch 'default': Explain the use of charset parameter with Content-Type header: issue11082 http://hg.python.org/cpython/rev/90e35b91756d |
|||
| msg155985 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年03月16日 01:18 | |
I have rewritten some parts of the documentation, explaining the use of charset parameter with Content-Type header. Used the suggestions from the previous patch's review comments too. I see that other parts of the documentation can be improved further, I shall do that separately. I am closing this issue. Thanks! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:12 | admin | set | github: 55291 |
| 2012年03月16日 01:18:26 | orsenthil | set | status: open -> closed messages: + msg155985 |
| 2012年03月16日 01:15:52 | python-dev | set | nosy:
+ python-dev messages: + msg155984 |
| 2011年11月09日 02:17:30 | ezio.melotti | set | messages: + msg147330 |
| 2011年11月09日 00:10:44 | orsenthil | set | files:
+ issue-11082-docs.patch messages: + msg147324 |
| 2011年11月08日 22:44:38 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2011年03月21日 13:38:03 | vstinner | set | nosy:
- vstinner |
| 2011年02月26日 13:59:40 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg129547 |
| 2011年02月12日 07:33:55 | georg.brandl | set | priority: deferred blocker -> normal messages: + msg128440 components: + Documentation, - Library (Lib) nosy: georg.brandl, orsenthil, vstinner, William.Wu |
| 2011年02月11日 12:44:10 | orsenthil | set | resolution: fixed messages: + msg128380 nosy: georg.brandl, orsenthil, vstinner, William.Wu |
| 2011年02月11日 07:58:20 | georg.brandl | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg128367 |
| 2011年02月11日 04:59:45 | orsenthil | set | files:
+ issue11082-2.diff nosy: georg.brandl, orsenthil, vstinner, William.Wu messages: + msg128362 |
| 2011年02月11日 04:59:09 | orsenthil | set | files:
- Issue11082-2.patch nosy: georg.brandl, orsenthil, vstinner, William.Wu |
| 2011年02月11日 04:05:48 | orsenthil | set | files:
+ Issue11082-2.patch nosy: georg.brandl, orsenthil, vstinner, William.Wu messages: + msg128361 |
| 2011年02月10日 17:53:33 | georg.brandl | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg128321 |
| 2011年02月10日 17:06:12 | orsenthil | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg128314 |
| 2011年02月10日 16:59:24 | vstinner | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg128308 |
| 2011年02月10日 16:11:49 | orsenthil | set | files:
+ issue11082.diff messages: + msg128304 nosy: georg.brandl, orsenthil, vstinner, William.Wu stage: patch review |
| 2011年02月10日 15:23:39 | orsenthil | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg128295 |
| 2011年02月04日 16:33:03 | georg.brandl | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg127904 |
| 2011年02月04日 15:07:38 | orsenthil | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg127897 |
| 2011年02月04日 14:47:52 | William.Wu | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg127896 |
| 2011年02月02日 19:25:40 | georg.brandl | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg127753 |
| 2011年02月02日 19:21:05 | georg.brandl | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg127752 |
| 2011年02月02日 17:08:18 | William.Wu | set | nosy:
georg.brandl, orsenthil, vstinner, William.Wu messages: + msg127744 |
| 2011年02月02日 10:42:30 | vstinner | set | nosy:
+ vstinner messages: + msg127731 |
| 2011年02月02日 09:13:34 | orsenthil | set | messages: + msg127728 |
| 2011年02月02日 08:36:25 | georg.brandl | set | priority: normal -> deferred blocker nosy: + georg.brandl, orsenthil messages: + msg127725 assignee: orsenthil |
| 2011年01月31日 19:34:51 | William.Wu | set | files: + urllib_request.patch |
| 2011年01月31日 19:34:15 | William.Wu | set | files:
+ test_urllib_request.patch keywords: + patch |
| 2011年01月31日 19:18:12 | William.Wu | create | |