This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年04月05日 11:51 by ritave, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| minimal.py | ritave, 2012年04月05日 11:51 | Minimal example where the not ideal behavior can be spotted | ||
| Messages (5) | |||
|---|---|---|---|
| msg157570 - (view) | Author: Olaf Tomalka (ritave) | Date: 2012年04月05日 11:51 | |
While this is wrongly formated html, I've spotted such an example on real website on the web, and all browsers handle the bad tag gracefully, while the python html parser throws an exception with "bad end tag", I think additional info in end tag should be ignored, no exception thrown and rest of the page parsed. I'm including minimal example. |
|||
| msg157582 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年04月05日 13:02 | |
Which version of python did you test with? There have been several improvements html parsing recently. |
|||
| msg157583 - (view) | Author: Olaf Tomalka (ritave) | Date: 2012年04月05日 13:04 | |
Python 3.2.2, which is latest on arch linux |
|||
| msg157585 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年04月05日 13:08 | |
I just tested your script on 3.2.3a2+, and it raises an error. Ezio made the other parsing changes, I'll leave it to him to evaluate what if anything should be done here. |
|||
| msg157601 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2012年04月05日 16:11 | |
This is already fixed, but only in non-strict mode (and 3.2.3 iirc). You should always use HTMLParser(strict=False). The non-strict mode will probably become the default and strict=True will be deprecated. Thanks anyway for the report, and please report any failure that you might find with strict=False. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:28 | admin | set | github: 58711 |
| 2012年04月05日 16:11:49 | ezio.melotti | set | status: open -> closed resolution: not a bug messages: + msg157601 stage: resolved |
| 2012年04月05日 13:08:32 | r.david.murray | set | messages:
+ msg157585 versions: + Python 3.3 |
| 2012年04月05日 13:04:59 | ritave | set | messages: + msg157583 |
| 2012年04月05日 13:02:04 | r.david.murray | set | nosy:
+ ezio.melotti, r.david.murray messages: + msg157582 |
| 2012年04月05日 12:28:18 | ritave | set | title: HTMLParser can't handle erronous end tags with additional tags in it -> HTMLParser can't handle erronous end tags with additional info in them |
| 2012年04月05日 11:51:41 | ritave | create | |