1

I want to use xml.etree.ElementTree to parse an XHTML document in Python 3. The document contains   entities, so I cannot use the default parser settings. I'd like to do something similar to:

with urllib.request.urlopen(BASE_URL) as url:
 body = url.read()
 parser = ET.XMLParser()
 parser.parser.UseForeignDTD(True)
 parser.entity.update(entitydefs)
 etree = ET.ElementTree()
 root = etree.fromstring(body)

But fromstring is a free function in ElementTree. How can I achieve something similar with ElementTree instance?

asked Mar 2, 2013 at 19:04

2 Answers 2

2

Well I encountered same problem. The sample code in the question and the chosen answer might work before, but right now it won't work in my Python 3.3 and Python 3.4 environment.

I finally got it working. Quoted from this Q&A.

Inspired by this post, we can just prepend some XML definition to the incoming raw HTML content, and then ElementTree would work out of box.

This works for both Python 2.6, 2.7, 3.3, 3.4.

import xml.etree.ElementTree as ET
html = '''<html>
 <div>Some reasonably well-formed HTML content.</div>
 <form action="login">
 <input name="foo" value="bar"/>
 <input name="username"/><input name="password"/>
 <div>It is not unusual to see &nbsp; in an HTML page.</div>
 </form></html>'''
magic = '''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [
 <!ENTITY nbsp ' '>
 ]>''' # You can define more entities here, if needed
et = ET.fromstring(magic + html)
answered Feb 24, 2016 at 1:18
Sign up to request clarification or add additional context in comments.

Comments

1

Feed the parser:

with urllib.request.urlopen(BASE_URL) as url:
 body = url.read()
 parser = ET.XMLParser()
 parser.parser.UseForeignDTD(True)
 parser.entity.update(entitydefs)
 parser.feed(body)
 root = parser.close() # this returns you the tree
answered Mar 2, 2013 at 19:07

1 Comment

Original poster asked for Python 3 solution, but parser.parser.UseForeignDTD(True) doesn't work in Python 3. How come this answer was chosen as the correct answer?

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.