This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年01月04日 13:26 by turion, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| htmlparserbug.py | turion, 2012年01月04日 13:26 | Script demonstrating the bug | ||
| Messages (8) | |||
|---|---|---|---|
| msg150603 - (view) | Author: Manuel Bärenz (turion) | Date: 2012年01月04日 13:26 | |
I've attached a script which demonstrates the bug. When feeding a <script> that contains a comment tag with the actual script and the script containing tags itself (e.g. a 'document.write(<td></td>)'), the parser doesn't call handle_comment and handle_starttag. |
|||
| msg150604 - (view) | Author: Manuel Bärenz (turion) | Date: 2012年01月04日 13:38 | |
I forgot to say, I'm using python version 3.2.2. |
|||
| msg150605 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年01月04日 13:55 | |
The content of a script tag is CDATA. Why would you expect it to be parsed? |
|||
| msg150606 - (view) | Author: Manuel Bärenz (turion) | Date: 2012年01月04日 14:25 | |
Oh, I wasn't aware of that. Then, the bug is actually calling handle_endtag. |
|||
| msg150607 - (view) | Author: Manuel Bärenz (turion) | Date: 2012年01月04日 14:28 | |
To clarify this even further: Consider
parser_instance.feed("<script><td></td></script>")
It should call:
parser_instance.handle_starttag("script", [])
parser_instance.handle_data("<td></td>")
parser_instance.handle_endtag("script", [])
Instead, it calls:
parser_instance.handle_starttag("script", [])
parser_instance.handle_data("<td>")
parser_instance.handle_endtag("td", [])
parser_instance.handle_endtag("script", [])
|
|||
| msg150608 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年01月04日 14:42 | |
I believe this was fixed recently as part of issue 670664. Ezio will know for sure. |
|||
| msg150611 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2012年01月04日 15:02 | |
Yep, this was fixed in #670664. With the development version of Python (AFAIK the fix has not be released yet) and the example parser found in the doc[0] I get this: >>> parser = MyHTMLParser() >>> parser.feed('<script><td></td></script>') Encountered a start tag: script Encountered some data: <td></td> Encountered an end tag: script [0]: http://docs.python.org/dev/library/html.parser.html#example-html-parser-application |
|||
| msg150614 - (view) | Author: Manuel Bärenz (turion) | Date: 2012年01月04日 16:19 | |
Great! Thank you! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:25 | admin | set | github: 57920 |
| 2012年01月04日 16:19:16 | turion | set | messages: + msg150614 |
| 2012年01月04日 15:02:17 | ezio.melotti | set | status: open -> closed superseder: HTMLParser.py - more robust SCRIPT tag parsing messages: + msg150611 assignee: ezio.melotti resolution: duplicate stage: resolved |
| 2012年01月04日 14:42:30 | r.david.murray | set | messages: + msg150608 |
| 2012年01月04日 14:28:47 | turion | set | messages: + msg150607 |
| 2012年01月04日 14:25:35 | turion | set | messages: + msg150606 |
| 2012年01月04日 13:55:44 | r.david.murray | set | nosy:
+ ezio.melotti, r.david.murray messages: + msg150605 |
| 2012年01月04日 13:38:27 | turion | set | messages: + msg150604 |
| 2012年01月04日 13:26:46 | turion | create | |