[Python-Dev] python hangs when parsing a bad-formed email

Tue Apr 22 09:43:02 CEST 2008

Hi all,
First of all, sorry if this isn't the list where I have to post this.
And sorry for my english.
As the subject says, I'm having problems with the attached email, when
I try to get a email object reading the attached file, the python
process gets hang and gets all cpu.
I have debuged my code to find where it happens, and I found that is
_parsegen method of the FeedParser class. I know that the email format
is wrong but I don't know why python hangs.
following paste the code showing where hangs.
def _parsegen(self):
 # Create a new message and start by parsing headers.
 self._new_message()
 headers = []
 # Collect the headers, searching for a line that doesn't match the RFC
 # 2822 header or continuation pattern (including an empty line).
 for line in self._input:
 if line is NeedMoreData:
 yield NeedMoreData
 continue
 if not headerRE.match(line):
 # If we saw the RFC defined header/body separator
 # (i.e. newline), just throw it away. Otherwise the line is
 # part of the body so push it back.
 if not NLCRE.match(line):
 self._input.unreadline(line)
 break
 headers.append(line)
 # Done with the headers, so parse them and figure out what we're
 # supposed to see in the body of the message.
 self._parse_headers(headers)
 # Headers-only parsing is a backwards compatibility hack, which was
 # necessary in the older parser, which could throw errors. All
 # remaining lines in the input are thrown into the message body.
 if self._headersonly:
 lines = []
 while True:
 line = self._input.readline()
 if line is NeedMoreData:
 yield NeedMoreData
 continue
 if line == '':
 break
 lines.append(line)
 self._cur.set_payload(EMPTYSTRING.join(lines))
 return
 if self._cur.get_content_type() == 'message/delivery-status':
!!!!!! AT THIS POINT HANGS, AND STRAT TO GET ALL CPU FOR THE PROCESS
 # message/delivery-status contains blocks of headers separated by
 # a blank line. We'll represent each header block as a separate
 # nested message object, but the processing is a bit different
 # than standard message/* types because there is no body for the
 # nested messages. A blank line separates the subparts.
 ...
 ...
 ...
I have workaround the problem adding this line in _parse_headers method
def _parse_headers(self, lines):
 # Passed a list of lines that make up the headers for the current msg
 lastheader = ''
 lastvalue = []
 for lineno, line in enumerate(lines):
 # Check for continuation
 if line[0] in ' \t':
 if not lastheader:
 # The first line of the headers was a continuation. This
 # is illegal, so let's note the defect, store the illegal
 # line, and ignore it for purposes of headers.
 defect = errors.FirstHeaderLineIsContinuationDefect(line)
 self._cur.defects.append(defect)
 continue
 if line.strip()!='': !!!!!!! IF THE CONTINUATION LINE
IS NOT EMPTY ADD THE LINE TO THE HEADER.
 lastvalue.append(line)
 continue
 if lastheader:
 ...
 ...
 ...
I don't know why it hangs and I'm not sure why with this line works......
I have tried to parse this email in python 2.3.3 SunOs, python 2.3.3 gcc
python 2.5.1 SunOs,gcc, Windows Xp, and linux SUSE 10. And I have
alway the same result.
bash-3.00$ python
Python 2.5.1 (r251:54863, Feb 28 2008, 07:48:25)
[GCC 3.4.6] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> fp = open('raro.txt')
>>> mail = email.message_from_file(fp)
never return............
I don't know if someone can tell me what is happening....
Best Regards.
Alberto Casado.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: raro.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20080422/46c9e000/attachment-0001.txt>