homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.header.Header doesn't fold headers correctly
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, kitterma, python-dev, r.david.murray, srikanths, vstinner
Priority: normal Keywords: patch

Created on 2011年03月14日 07:32 by kitterma, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
header_folding_tests.patch r.david.murray, 2011年04月07日 22:09
better_header_spliter.patch r.david.murray, 2011年04月10日 20:12 review
Pull Requests
URL Status Linked Edit
PR 10378 closed vstinner, 2018年11月07日 16:50
Messages (14)
msg130793 - (view) Author: Scott Kitterman (kitterma) Date: 2011年03月14日 07:32
Header folding is very different (non-existent as far as I've found so far) in Python3. Here's a short example:
#!/usr/bin/python
# -*- coding: ISO-8859-1
from email.header import Header
hdrin = 'Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; 2011年3月13日 23:46:05 -0400 (EDT)'
print(Header(hdrin))
With python2.6 the output is:
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com
 [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for
 <bcc@kitterman.com>; 2011年3月13日 23:46:05 -0400 (EDT)
With python3.1 or 3.2 the output is one line:
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; 2011年3月13日 23:46:05 -0400 (EDT)
This makes it very difficult to write header processing code that works for both Python2 and Python3 even if one can fold headers at all in Python3.
msg130920 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年03月14日 22:47
It exists, but clearly it is broken. I'll look in to it.
msg133101 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年04月06日 00:35
Ah, it isn't broken, it's just that the default changed. In 2.x, the default was maxlinelen=78, in 3.x, the default is maxlinelen=None (unlimited), but generator passes in an override of 78 when formatting output. So you can specify an explicit maxlinelen=78 and that will wrap the headers in both 2.x and 3.x. (There are differences in the wrapping algorithms, though!)
msg133169 - (view) Author: Scott Kitterman (kitterma) Date: 2011年04月06日 21:21
Not so fast ... I may have done this wrong, but I get:
print(Header(hdrin,maxlinelen=78))
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; 2011年3月13日 23:46:05 -0400 (EDT)
all in one line with python3.2, so maxlinelen doesn't appear to do anything. With python2.7 it seems to when invoked that way:
Python 2.7.1+ (r271:86832, Mar 24 2011, 00:39:14) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.header import Header
>>> hdrin = 'Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; 2011年3月13日 23:46:05 -0400 (EDT)'
>>> print(Header(hdrin))
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com
 [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for
 <bcc@kitterman.com>; 2011年3月13日 23:46:05 -0400 (EDT)
>>> print(Header(hdrin, maxlinelen=30))
Received: from
 mailout00.controlledmail.com
 (mailout00.controlledmail.com
 [72.81.252.19]) by
 mailwash7.pair.com (Postfix)
 with ESMTP id 16BB5BAD5 for
 <bcc@kitterman.com>;
 2011年3月13日 23:46:05
 -0400 (EDT)
msg133170 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年04月06日 21:43
You have to do an 'encode' to get the wrapped header. __str__ uses maxlinelen=None.
However, there does seem to be a problem with the line wrapping algorithm revealed by your example: it is only doing a line break at the ';', not at any spaces. I will look in to this.
msg133233 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年04月07日 16:56
OK, it looks like the wrapping problem arises when the line contains runs of blank delimited tokens longer than maxlinelen *and* the line also contains ';'s. The line is then split at the ';' and the remaining overlong pieces are not split.
I'll work on a fix.
msg133242 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年04月07日 17:51
Here is a patch containing three test cases that demonstrate three different failings of the header folding algorithm. I'm working on the fix, but it is non-trivial.
msg133243 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年04月07日 17:57
Note that 2.7 fails two of these tests as well, but for different reasons. I'm not currently planning to fix 2.7, as its behavior at least (a) doesn't lose non-whitespace information and (b) doesn't exceed the maxheaderlen.
msg133266 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年04月07日 22:09
Here is an updated test patch that brings the test coverage of the relevant code much closer to 100%. There are still three lines and one branch uncovered, but it appears as though one of the bugs is preventing the test case that would produce full coverage from getting to the relevant code path. This gives me enough coverage to feel safer mucking about with the algorithm.
msg133283 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年04月08日 01:01
New changeset 10725fc76e11 by R David Murray in branch '3.1':
#11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/10725fc76e11
New changeset 74ec64dc3538 by R David Murray in branch '3.2':
Merge #11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/74ec64dc3538
New changeset 5ec2695c9c15 by R David Murray in branch 'default':
Merge #11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/5ec2695c9c15 
msg133477 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年04月10日 20:12
This was quite the adventure. The more I worked on fixing the tests, the more if/else cases the existing splitting algorithm grew. When I reached the point where fixing one test broke two others, I thought maybe it was time to try a different approach.
Based on the knowledge gathered by banging my head on the old algorithm, I developed a new one. This one is more RFC2822/RFC5322 compliant, I believe. It breaks only at FWS, but still gives preference to breaking after commas or semicolons by default.
I had to adjust several tests that tested broken behavior: the "folded" lines were longer than maxlen even though there were suitable fold points.
I'm very happy with this patch because there are 70 fewer lines of code but the module passes more tests.
Even though the code changes are extensive, I plan to apply this to 3.2. It fixes bugs, and the new code is at least somewhat easier to understand than the old code (if only because there is less of it!) I don't plan to apply it to 3.1 because one older test fails if the patch is applied and I don't understand why (it appears to have nothing to do with line wrapping, and the same test works fine in 3.2).
msg133480 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年04月10日 20:30
Note that this fix solves issue 11772, so I've closed that one as a duplicate.
msg133969 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年04月18日 14:12
New changeset 51a551acb60d by R David Murray in branch '3.2':
#11492: rewrite header folding algorithm. Less code, more passing tests.
http://hg.python.org/cpython/rev/51a551acb60d
New changeset fcd20a565b95 by R David Murray in branch 'default':
Merge: #11492: rewrite header folding algorithm. Less code, more passing tests.
http://hg.python.org/cpython/rev/fcd20a565b95 
msg329426 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年11月07日 16:56
I wrote PR 10378 to show that I don't think that this bug must be fixed in Python 2: it would break any application relying on the current folding algorithm.
History
Date User Action Args
2022年04月11日 14:57:14adminsetgithub: 55701
2018年11月07日 16:56:13vstinnersetnosy: + vstinner
messages: + msg329426
2018年11月07日 16:50:52vstinnersetpull_requests: + pull_request9680
2012年01月03日 07:37:54srikanthssetnosy: + srikanths
2011年04月18日 15:11:16r.david.murraylinkissue5612 superseder
2011年04月18日 15:05:41r.david.murraylinkissue1372770 superseder
2011年04月18日 14:45:33r.david.murraylinkissue8769 superseder
2011年04月18日 14:27:32r.david.murraysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2011年04月18日 14:12:29python-devsetmessages: + msg133969
2011年04月10日 20:30:53r.david.murraysetmessages: + msg133480
2011年04月10日 20:12:32r.david.murraysetfiles: + better_header_spliter.patch

stage: needs patch -> patch review
messages: + msg133477
versions: - Python 3.1
2011年04月10日 16:59:34r.david.murraylinkissue11772 superseder
2011年04月08日 01:01:28python-devsetnosy: + python-dev
messages: + msg133283
2011年04月07日 22:09:38r.david.murraysetfiles: + header_folding_tests.patch

messages: + msg133266
2011年04月07日 22:09:02r.david.murraysetfiles: - header_folding_tests.patch
2011年04月07日 17:57:03r.david.murraysetfiles: + header_folding_tests.patch

messages: + msg133243
2011年04月07日 17:53:40r.david.murraysetfiles: - header_folding_tests.patch
2011年04月07日 17:51:17r.david.murraysetfiles: + header_folding_tests.patch
title: email.header.Header doesn't fold headers at spaces if value contains ';'s -> email.header.Header doesn't fold headers correctly
messages: + msg133242

components: + Library (Lib), - None
keywords: + patch
2011年04月07日 16:56:23r.david.murraysettitle: email.header.Header doesn't fold headers at spaces -> email.header.Header doesn't fold headers at spaces if value contains ';'s
messages: + msg133233
stage: resolved -> needs patch
2011年04月06日 21:43:11r.david.murraysetmessages: + msg133170
title: email.header.Header doesn't fold headers -> email.header.Header doesn't fold headers at spaces
2011年04月06日 21:21:46kittermasetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg133169
2011年04月06日 00:35:12r.david.murraysetstatus: open -> closed
resolution: not a bug
messages: + msg133101

stage: needs patch -> resolved
2011年03月14日 22:47:39r.david.murraysetversions: + Python 3.1, Python 3.2, Python 3.3
nosy: barry, r.david.murray, kitterma
messages: + msg130920

assignee: r.david.murray
type: behavior
stage: needs patch
2011年03月14日 21:57:19barrysetnosy: + barry, r.david.murray
2011年03月14日 07:32:57kittermacreate

AltStyle によって変換されたページ (->オリジナル) /