Jump to content
Wikibooks The Free Textbook Project

Python Programming/XML Tools

From Wikibooks, open books for an open world
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.


Introduction

Python includes several modules for manipulating xml.

xml.sax.handler

Python Doc

importxml.sax.handlerassaxhandler
importxml.saxassaxparser
classMyReport:
 def__init__(self):
 self.Y = 1
classMyCH(saxhandler.ContentHandler):
 def__init__(self, report):
 self.X = 1
 self.report = report
 defstartDocument(self):
 print('startDocument')
 defstartElement(self, name, attrs):
 print('Element:', name)
report = MyReport() #for future use
ch = MyCH(report)
xml = """\
<collection>
 <comic title=\"Sandman\" number='62'>
 <writer>Neil Gaiman</writer>
 <penciller pages='1-9,18-24'>Glyn Dillon</penciller>
 <penciller pages="10-17">Charles Vess</penciller>
 </comic>
</collection>
"""
print(xml)
saxparser.parseString(xml, ch)

xml.dom.minidom

An example of doing RSS feed parsing with DOM

fromxml.domimport minidom as dom
importurllib2
deffetchPage(url):
 a = urllib2.urlopen(url)
 return ''.join(a.readlines())
defextract(page):
 a = dom.parseString(page)
 item = a.getElementsByTagName('item')
 for i in item:
 if i.hasChildNodes():
 t = i.getElementsByTagName('title')[0].firstChild.wholeText
 l = i.getElementsByTagName('link')[0].firstChild.wholeText
 d = i.getElementsByTagName('description')[0].firstChild.wholeText
 print(t, l, d)
if __name__=='__main__':
 page = fetchPage("http://rss.slashdot.org/Slashdot/slashdot")
 extract(page)

XML document provided by pyxml documentation.

AltStyle によって変換されたページ (->オリジナル) /