homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: HTMLParser ParseError in start tag
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: akuchling Nosy List: akuchling, bernd_zedv, nnseva
Priority: normal Keywords:

Created on 2004年03月23日 10:17 by bernd_zedv, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (4)
msg20293 - (view) Author: Bernd Zimmermann (bernd_zedv) Date: 2004年03月23日 10:17
when this - obviously correct html - is parsed:
<a href=mailto:xyz@domain.com>xyz</a>
this exception is raised:
HTMLParseError: junk characters in start 
tag: '@domain.com>', at line 1, column 1
I work around this by adding '@' to the
allowed character's class:
import HTMLParser
HTMLParser.attrfind = re.compile(
 r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*'
 r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)
_#=~@]*))?')
myparser = HTMLParser.HTMLParser()
myparser.feed('<a ... ')
msg20294 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2004年04月19日 13:01
Logged In: YES 
user_id=11375
I don't believe this HTML is obviously correct. 
The section on attributes in the HTML 4.01 Recommendation
(http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2) says:
In certain cases, authors may specify the value of an
attribute without any quotation marks. The attribute value
may only contain letters (a-z and A-Z), digits (0-9),
hyphens (ASCII decimal 45), periods (ASCII decimal 46),
underscores (ASCII decimal 95), and colons (ASCII decimal
58). We recommend using quotation marks even when it is
possible to eliminate them. 
The regex is already more liberal than this, allowing slashes
and various other symbols, so we might as well add '@', but
you should also consider adding quotation marks to the
original attribute.
msg20295 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2004年06月05日 15:32
Logged In: YES 
user_id=11375
Committed to the CVS HEAD; thanks!
msg20296 - (view) Author: Vsevolod Novikov (nnseva) Date: 2004年10月13日 10:16
Logged In: YES 
user_id=325678
see request #1046092 to fix it
History
Date User Action Args
2022年04月11日 14:56:03adminsetgithub: 40065
2004年03月23日 10:17:42bernd_zedvcreate

AltStyle によって変換されたページ (->オリジナル) /