Message225814
| Author |
ncoghlan |
| Recipients |
ncoghlan |
| Date |
2014年08月24日.12:45:41 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1408884341.89.0.273491669506.issue22264@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
The WSGI 1.1 standard mandates that binary data be decoded as latin-1 text: http://www.python.org/dev/peps/pep-3333/#unicode-issues
This means that many WSGI headers will in fact contain *improperly encoded data*. Developers working directly with WSGI (rather than using a WSGI framework like Django, Flask or Pyramid) need to convert those strings back to bytes and decode them properly before passing them on to user applications.
I suggest adding a simple "fix_encoding" function to wsgiref that covers this:
def fix_encoding(data, encoding, errors="surrogateescape"):
return data.encode("latin-1").decode(encoding, errors)
The primary intended benefit is to WSGI related code more self-documenting. Compare the proposal with the status quo:
data = wsgiref.fix_encoding(data, "utf-8")
data = data.encode("latin-1").decode("utf-8", "surrogateescape")
The proposal hides the mechanical details of what is going on in order to emphasise *why* the change is needed, and provides you with a name to go look up if you want to learn more.
The latter just looks nonsensical unless you're already familiar with this particular corner of the WSGI specification. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2014年08月24日 12:45:41 | ncoghlan | set | recipients:
+ ncoghlan |
| 2014年08月24日 12:45:41 | ncoghlan | set | messageid: <1408884341.89.0.273491669506.issue22264@psf.upfronthosting.co.za> |
| 2014年08月24日 12:45:41 | ncoghlan | link | issue22264 messages |
| 2014年08月24日 12:45:41 | ncoghlan | create |
|