homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: smtpd crashes when a multi-byte UTF-8 sequence is split between consecutive data packets
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, Illirgway, barry, doerwalter, giampaolo.rodola, jesstess, petri.lehtinen, r.david.murray, zvyn
Priority: normal Keywords: needs review, patch

Created on 2013年11月27日 06:10 by petri.lehtinen, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
smtpd.patch petri.lehtinen, 2013年11月27日 06:10 Patch from GitHub
smtpd_undecodable_data_does_not_raise.patch zvyn, 2014年07月09日 15:33 review
smtpd_undecodable_data_does_not_raiseV2.patch zvyn, 2014年07月16日 19:56 review
Messages (14)
msg204553 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2013年11月27日 06:10
From Illirgway, https://github.com/akheron/cpython/pull/2:
Fix utf-8 bug, when symbol byte chain splits on 2 tcp-packets:
error: uncaptured python exception, closing channel (:'utf-8' codec can't decode byte 0xd0 in position 1474: unexpected end of data [/usr/lib/python3.3/asyncore.py|read|83] [/usr/lib/python3.3/asyncore.py|handle_read_event|441] [/usr/lib/python3.3/asynchat.py|handle_read|178] [/usr/lib/python3.3/smtpd.py|collect_incoming_data|289])
msg204577 - (view) Author: (Illirgway) Date: 2013年11月27日 12:34
"base64", "quoted-printable" and "7bit" messages only use ASCII range of characters so raw_message.decode('ascii') == raw_message.decode('latin1') == etc. == raw_message.decode('utf-8') due to internal representation of utf-8 characters
P.S. Python 3.4.0 beta 1 Lib/smtpd has the same bug and needs the same patch to fix this issue
msg204578 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年11月27日 13:14
It occurs to me to wonder why smtpd is receiving 8bit data at all. It isn't advertising the 8BITMIME capability, so the client should be sending only 7bit data. See also issue 19662. I'm not sure that making utf-8 decoding work better is a good idea, but I suppose that since it is a data-dependent exception in a situation that would work with different data, we should go ahead and fix it.
msg204580 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2013年11月27日 13:45
Is there a reason why this patch changes the decoding error handler to 'ignore' (from the default 'strict')?
msg204585 - (view) Author: (Illirgway) Date: 2013年11月27日 14:17
because "strict" throws an exception (for example, on raw win1251) which brings down production smtp daemon and "replace" embeds "ugly" characters into received (and passed) email messages
msg221208 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014年06月21日 23:11
@Petri/Illirgway could one of you produce a new patch including the code change and a test for this?
msg222622 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014年07月09日 15:33
I do not think that the proposed patch solves the bug because it silently changes binary input. With the patch for issue 19662 a proper solution to avoid this bug has been applied. The only thing left is to prevent the server to raise the exception when in legacy mode. Instead of deleting single bytes from the input (which is what .decode('utf-8', 'ignore') does) I would reply with an error to the client. The attached patch implements and tests this behaviour.
msg222735 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2014年07月11日 10:31
I don't know anything about SMTP, but would it make sense to use an incremental decoder for decoding UTF-8?
msg223023 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014年07月14日 13:09
As Milan said, the problem doesn't arise in 3.5 with decode_data=False, since there's no decoding. His patch doesn't actually fix the bug for the decode_data=True case, though, since the bug is a *valid* utf-8 sequence getting split across tcp buffers.
To fix it, we would need to change the implementation of decode_data. Instead of conditionally decoding in collect_data, we'd need to postpone decoding to found_terminator. This would have the undesirable affect of changing what is in the received_lines attribute, which is why we didn't do it in the decode_data patch. Using an incremental decoder won't solve that problem, since it too would change what gets stored in received_lines.
Since decode_data=True is really not a legitimate mode for smtpd (it is an historical accident/bug) and we are planning on removing it eventually, I think we should go ahead and apply Milan's patch as is, since it does improve the error reporting. The message would need to be adjusted though, since it can trigger on valid utf-8 data. It should say that smtpd should be run with decode_data=False in order to fix the decode problem.
That would leave the bug as-is in 3.4, but a similar patch with an error message suggesting an upgrade to 3.5/decode_data=True could be applied. That feels a little weird, though :).
msg223263 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014年07月16日 19:56
Note that in this (and my previous) patch the message is sent to the client (the idea was not to raise an exception). Maybe it would be better to raise an exception with the information you mentioned?
msg223267 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014年07月16日 20:30
No, I think the error return is better. Crashing the server on bad input data (which, worse, could be perfectly valid if the server were standards compliant) is a bad thing, I think. That is, if someone has an application that actually works with decode_data=True, I think it is better if it doesn't crash just because of different input data or TCP packet boundaries, as long as the error is reported. I'm willing to be convinced otherwise, but that's how I'm leaning.
msg223274 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014年07月16日 21:04
Agreed. It just feels a bit weird to send programming instructions for the server to the client (but it's probably the best we can do here).
msg223278 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014年07月16日 21:17
Oh, that's a good point, I hadn't thought of that. Maybe I can come up with a better wording.
msg309754 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018年01月10日 00:56
I'm closing this as won't fix since smtpd.py is deprecated and will likely not get any future development. Please see aiosmtpd as a much better third party replacement.
History
Date User Action Args
2022年04月11日 14:57:54adminsetgithub: 64005
2018年01月10日 00:56:05barrysetstatus: open -> closed

nosy: + barry
messages: + msg309754

resolution: wont fix
stage: patch review -> resolved
2014年07月16日 21:17:09r.david.murraysetmessages: + msg223278
2014年07月16日 21:04:20zvynsetmessages: + msg223274
2014年07月16日 20:30:12r.david.murraysetmessages: + msg223267
2014年07月16日 19:56:13zvynsetfiles: + smtpd_undecodable_data_does_not_raiseV2.patch

messages: + msg223263
2014年07月14日 13:09:49r.david.murraysetmessages: + msg223023
2014年07月11日 10:31:13doerwaltersetnosy: + doerwalter
messages: + msg222735
2014年07月09日 15:34:39zvynsetnosy: + jesstess
2014年07月09日 15:33:02zvynsetfiles: + smtpd_undecodable_data_does_not_raise.patch
nosy: + zvyn
messages: + msg222622

2014年06月21日 23:11:47BreamoreBoysetnosy: + BreamoreBoy

messages: + msg221208
versions: + Python 3.5, - Python 3.3
2013年11月27日 14:17:07Illirgwaysetmessages: + msg204585
2013年11月27日 13:45:19petri.lehtinensetmessages: + msg204580
2013年11月27日 13:14:24r.david.murraysetnosy: + r.david.murray
messages: + msg204578
2013年11月27日 12:34:08Illirgwaysetnosy: + Illirgway

messages: + msg204577
versions: + Python 3.4
2013年11月27日 06:10:48petri.lehtinencreate

AltStyle によって変換されたページ (->オリジナル) /