I'm using ElementTree to handle some html. I think html is an xml language, so that should be ok.
In html, you can have tags inside text:
<p>
This paragraph <em>has some</em> emphasised words.
</p>
So the "p" element has some text ("This paragraph "), a child element ("em") and some more text (" emphasised words.")
But ElementTree elements have a text attribute, which is a string. The child elements are in a list, but the text is all together in one string.
How do I represent this html in ElementTree? Is it possible?
Dan-Dev
9,5983 gold badges42 silver badges58 bronze badges
1 Answer 1
Are you trying to parse it?
import xml.etree.ElementTree as ET
def processElem(elem):
if elem.text is not None:
print elem.text
for child in elem:
processElem(child)
if child.tail is not None:
print child.tail
xml = '''<p>
This paragraph <em>has some</em> emphasised words.
</p>'''
root = ET.fromstring(xml)
processElem(root)
gives:
This paragraph
has some
emphasised words.
Or are you trying to modify the HTML?
from xml.etree.ElementTree import Element, SubElement, tostring
top = Element('p')
top.text = 'This paragraph '
child_with_tail = SubElement(top, 'em')
child_with_tail.text = 'has some'
child_with_tail.tail = ' emphasised words.'
print tostring(top)
gives:
<p>This paragraph <em>has some</em> emphasised words.</p>
answered Dec 10, 2016 at 19:11
Dan-Dev
9,5983 gold badges42 silver badges58 bronze badges
Sign up to request clarification or add additional context in comments.
2 Comments
fpeelo
Ahhh, so you are saying that the text after each embedded element, up to the next embedded element, lives in the tails of the embedded elements?
Dan-Dev
Yes that is correct. See docs.python.org/2/library/…
lang-py