This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004年03月23日 10:17 by bernd_zedv, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (4) | |||
|---|---|---|---|
| msg20293 - (view) | Author: Bernd Zimmermann (bernd_zedv) | Date: 2004年03月23日 10:17 | |
when this - obviously correct html - is parsed: <a href=mailto:xyz@domain.com>xyz</a> this exception is raised: HTMLParseError: junk characters in start tag: '@domain.com>', at line 1, column 1 I work around this by adding '@' to the allowed character's class: import HTMLParser HTMLParser.attrfind = re.compile( r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*' r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\) _#=~@]*))?') myparser = HTMLParser.HTMLParser() myparser.feed('<a ... ') |
|||
| msg20294 - (view) | Author: A.M. Kuchling (akuchling) * (Python committer) | Date: 2004年04月19日 13:01 | |
Logged In: YES user_id=11375 I don't believe this HTML is obviously correct. The section on attributes in the HTML 4.01 Recommendation (http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2) says: In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them. The regex is already more liberal than this, allowing slashes and various other symbols, so we might as well add '@', but you should also consider adding quotation marks to the original attribute. |
|||
| msg20295 - (view) | Author: A.M. Kuchling (akuchling) * (Python committer) | Date: 2004年06月05日 15:32 | |
Logged In: YES user_id=11375 Committed to the CVS HEAD; thanks! |
|||
| msg20296 - (view) | Author: Vsevolod Novikov (nnseva) | Date: 2004年10月13日 10:16 | |
Logged In: YES user_id=325678 see request #1046092 to fix it |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:03 | admin | set | github: 40065 |
| 2004年03月23日 10:17:42 | bernd_zedv | create | |