This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年12月12日 18:01 by r.david.murray, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| email_unknown-8bit.patch | r.david.murray, 2011年01月07日 03:31 | |||
| Messages (8) | |||
|---|---|---|---|
| msg123842 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年12月12日 18:01 | |
This is a followon to Issue 4661. The fix for that issue introduced a way to parse messages containing 8bit bytes. When Generator is called on a model containing 8 bit bytes, it converts it to 7bit clean. There is, however, a bug in this conversion process: currently when encountering 8bit bytes in headers, it simply replaces then with ?. According to the RFCs[*], what it should do instead is to replace them with encoded words using the 'charset' "unknown-8bit". [*] I'm specifically referring to RFC 1428...email is effectively acting as a translating gateway when requested to do the 8bit to 7bit conversion. Although that RFC does not explicitly say that the unknown-8bit charset should be used in encoded words, it does imply it strongly in its section 3 prescription. |
|||
| msg125615 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年01月07日 03:31 | |
Here is a patch. Three of the tests currently fail due to what appears to be a bug in the Header formatting routines. I'll have to look in to that before finishing this issue. Note that doing str on a message with binary headers can produce overlong lines, since str does not limit line widths. generator.flatten does, though, so in that case the lengthened lines are correctly rewrapped. (Well, as correctly as Header rewraps any headers, at least, which is not all that well in certain cases). |
|||
| msg125616 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年01月07日 03:38 | |
I have a little bit of concern whether or not 'unknown-8bit' is the correct charset to use. It seems to be the one in the RFCs, but I have a feeling it may not be what is used "in the wild" in headers, so I am looking for opinions. |
|||
| msg125619 - (view) | Author: Stephen J. Turnbull (sjt) * (Python triager) | Date: 2011年01月07日 04:25 | |
I agree with you that according to RFC1428, use of unknown-8bit is implicitly recommended. However, note that the RFC itself is not standards-track. I agree with your interpretation that in this context the email module should be considered a gateway. I think it is certainly best to convert to MIME words, as you say. However, if there isn't already, maybe there should be an option to bounce such headers back to the user? That is, in an interactive application this should be an error. Of course we should help the user by allowing and documenting (perhaps even defaulting to) whatever we choose for the unknown encoding. I don't recall ever seeing unknown-8bit in the wild. What I do see in the wild a lot, and specifically in Mailman moderation traffic, is simply "unknown". A quick google for "unknown-8bit" pulled up some old (2002) discussion of unknown-8bit causing problems for some MTAs. I didn't follow up to see what those were. I don't have time to do it myself today (but would be willing to help out if you can wait up to two weeks -- I have travel coming up), but I suggest checking for IANA registration of "unknown" and "unknown-8bit". |
|||
| msg125642 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年01月07日 12:37 | |
Well, unknown-8bit is registered as a charset with IANA. It is registered specifically for use in message bodies, but as a registered charset it "should" be acceptable in headers as well. There is no similar registration for just 'unknown', but it sounds like mailers may be more likely to accept it if it exists in the wild. I'm hoping to fix this before the RC (which is tomorrow, which means fixing it today), so your suggestion of making the 'unknown charset' token configurable is a good one. I'm not so worried about providing a way to reject such headers, since this incarnation of email makes a point of not throwing errors on parsing, and if you read binary messages with unknown bytes the best thing to do is generate the outgoing message with BytesGenerator, in which case you get the unknown bytes back without the rfc2047 munging. |
|||
| msg125648 - (view) | Author: Barry A. Warsaw (barry) * (Python committer) | Date: 2011年01月07日 15:41 | |
I'm a little uncomfortable with relying on a non-standards track RFC for this interpretation, and I'm also not sure I'd say that the email package is a "transport agent", but in cases where it's acting on the user's behalf (i.e. headers created programmatically rather than parsed), I can get on board with that. Your interpretation and approach to the fix seems reasonable, and I don't have any better ideas. |
|||
| msg125657 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年01月07日 16:26 | |
Well, since unknown-8bit is a registered charset, it should be RFC-valid in an encoded word. Whether or not any other mailer out there is going to be able to handle it is a different question. |
|||
| msg125728 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年01月07日 23:28 | |
Committed a revised version of the patch, including doc updates, in r87840. While I haven't documented the way to alter what encoding name is used for the unknown bytes, I did make it possible to do so (set charset.UNKNOWN8BIT to the desired string). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:10 | admin | set | github: 54895 |
| 2011年01月07日 23:28:37 | r.david.murray | set | status: open -> closed nosy: barry, r.david.murray, sjt messages: + msg125728 resolution: fixed stage: needs patch -> resolved |
| 2011年01月07日 16:26:35 | r.david.murray | set | nosy:
barry, r.david.murray, sjt messages: + msg125657 |
| 2011年01月07日 15:41:35 | barry | set | nosy:
barry, r.david.murray, sjt messages: + msg125648 |
| 2011年01月07日 12:37:16 | r.david.murray | set | nosy:
barry, r.david.murray, sjt messages: + msg125642 |
| 2011年01月07日 04:25:23 | sjt | set | nosy:
barry, r.david.murray, sjt messages: + msg125619 |
| 2011年01月07日 03:38:05 | r.david.murray | set | nosy:
+ sjt messages: + msg125616 |
| 2011年01月07日 03:32:25 | r.david.murray | set | nosy:
+ barry |
| 2011年01月07日 03:31:53 | r.david.murray | set | files:
+ email_unknown-8bit.patch messages: + msg125615 keywords: + patch |
| 2010年12月12日 18:01:47 | r.david.murray | create | |