Message 283651 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	Eric Lafontaine
Recipients	Eric Lafontaine, barry, bpoaugust, r.david.murray
Date	2016年12月19日.21:38:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1482183513.75.0.899214043444.issue28945@psf.upfronthosting.co.za>

Content
Hi all, I believe this is the right behavior and what ever generated the boundary "<<>>" is the problem ; RFC 2046 page 22: _____________________ The only mandatory global parameter for the "multipart" media type is the boundary parameter, which consists of 1 to 70 characters from a set of characters known to be very robust through mail gateways, and NOT ending with white space. (If a boundary delimiter line appears to end with white space, the white space must be presumed to have been added by a gateway, and must be deleted.) It is formally specified by the following BNF: boundary := 0*69<bchars> bcharsnospace bchars := bcharsnospace / " " bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / "_" / "," / "-" / "." / "/" / ":" / "=" / "?" _____________________ In other words, the only valid boundaries characters are : 01234567890 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'()+_,-./:=? Any other character should be removed to get the boundary right. I believe the issue is that it wasn't removed in the first place. It is a bug in my opinion, but the other way around :). Funny thing is that the unquote function only remove the first&last character it sees... either '<' and the '"'... def unquote(str): """Remove quotes from a string.""" if len(str) > 1: if str.startswith('"') and str.endswith('"'): return str[1:-1].replace('\\\\', '\\').replace('\\"', '"') if str.startswith('<') and str.endswith('>'): return str[1:-1] return str Now, if I modify unquote to only keep the list of character above, would I break something? Probably. I haven't found any other defining RFC about boundaries that tells me what was the format supported. Can someone help me on that? This is what the function should look like : import string def get_boundary(str): """ return the valid boundary parameter as per RFC 2046 page 22. """ if len(str) > 1: import re return re.sub('[^'+ string.ascii_letters + string.digits + r""" '()+_,-./:=?]\|=""" ,'',str ).rstrip(' ') return str import unittest class boundary_tester(unittest.TestCase): def test_get_boundary(self): boundary1 = """ abc def gh< 123 >!@ %!%' """ ref_boundary1 = """ abc def gh 123 '""" # this is the valid Boundary ret_value = get_boundary(boundary1) self.assertEqual(ret_value,ref_boundary1) def test_get_boundary2(self): boundary1 = ''.join((' ',string.printable)) ref_boundary1 = ' 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\'()+,-./:?_' # this is the valid Boundary ret_value = get_boundary(boundary1) self.assertEqual(ret_value,ref_boundary1) I believe this should be added to the email.message.Message get_boundary function. Regards, Eric Lafontaine

Content

Hi all,
I believe this is the right behavior and what ever generated the boundary "<<>>" is the problem ; 
RFC 2046 page 22:
_____________________
The only mandatory global parameter for the "multipart" media type is the boundary parameter, which consists of 1 to 70 characters from a set of characters known to be very robust through mail gateways, and NOT ending with white space. (If a boundary delimiter line appears to end with white space, the white space must be presumed to have been added by a gateway, and must be deleted.) It is formally specified by the following BNF:
 boundary := 0*69<bchars> bcharsnospace
 bchars := bcharsnospace / " "
 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
 "+" / "_" / "," / "-" / "." /
 "/" / ":" / "=" / "?"
_____________________
In other words, the only valid boundaries characters are :
01234567890 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'()+_,-./:=?
Any other character should be removed to get the boundary right. I believe the issue is that it wasn't removed in the first place. It is a bug in my opinion, but the other way around :).
Funny thing is that the unquote function only remove the first&last character it sees... either '<' and the '"'...
def unquote(str):
 """Remove quotes from a string."""
 if len(str) > 1:
 if str.startswith('"') and str.endswith('"'):
 return str[1:-1].replace('\\\\', '\\').replace('\\"', '"')
 if str.startswith('<') and str.endswith('>'):
 return str[1:-1]
 return str
Now, if I modify unquote to only keep the list of character above, would I break something? Probably. 
I haven't found any other defining RFC about boundaries that tells me what was the format supported. Can someone help me on that?
This is what the function should look like :
import string
def get_boundary(str):
 """ return the valid boundary parameter as per RFC 2046 page 22. """
 if len(str) > 1:
 import re
 return re.sub('[^'+
 string.ascii_letters +
 string.digits +
 r""" '()+_,-./:=?]|="""
 ,'',str
 ).rstrip(' ')
 return str
import unittest
class boundary_tester(unittest.TestCase):
 def test_get_boundary(self):
 boundary1 = """ abc def gh< 123 >!@ %!%' """
 ref_boundary1 = """ abc def gh 123 '""" # this is the valid Boundary
 ret_value = get_boundary(boundary1)
 self.assertEqual(ret_value,ref_boundary1)
 def test_get_boundary2(self):
 boundary1 = ''.join((' ',string.printable))
 ref_boundary1 = ' 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\'()+,-./:?_' # this is the valid Boundary
 ret_value = get_boundary(boundary1)
 self.assertEqual(ret_value,ref_boundary1)
I believe this should be added to the email.message.Message get_boundary function. 
Regards,
Eric Lafontaine

History
Date	User	Action	Args
2016年12月19日 21:38:33	Eric Lafontaine	set	recipients: + Eric Lafontaine, barry, r.david.murray, bpoaugust
2016年12月19日 21:38:33	Eric Lafontaine	set	messageid: <1482183513.75.0.899214043444.issue28945@psf.upfronthosting.co.za>
2016年12月19日 21:38:33	Eric Lafontaine	link	issue28945 messages
2016年12月19日 21:38:33	Eric Lafontaine	create

homepage