Previous: 12.19 robotparser Up: Python Library Reference Next: 13.1 HTMLParser


13. Structured Markup Processing Tools

Python supports a variety of modules to work with various forms of structured data markup. This includes modules to work with the Standard Generalized Markup Language (SGML) and the Hypertext Markup Language (HTML), and several interfaces for working with the Extensible Markup Language (XML).

It is important to note that modules in the xml package require that there be at least one SAX-compliant XML parser available. Python includes an interface to the Expat parser as the xml.parsers.expat module, but this may not be built by default on all platforms, since Expat is not always installed, or not installed in the default location for libraries. If this is the case for your system, the easiest way to add support for the xml package is to install the PyXML add-on package. That package provides an extended set of XML libraries for Python.

The documentation for the xml.dom and xml.sax packages are the definition of the Python bindings for the DOM and SAX interfaces.

HTMLParser A simple parser that can handle HTML and XHTML.
sgmllib Only as much of an SGML parser as needed to parse HTML.
htmllib A parser for HTML documents.
htmlentitydefs Definitions of HTML general entities.
xml.parsers.expat An interface to the Expat non-validating XML parser.
xml.dom Document Object Model API for Python.
xml.dom.minidom Lightweight Document Object Model (DOM) implementation.
xml.dom.pulldom Support for building partial DOM trees from SAX events.
xml.sax Package containing SAX2 base classes and convenience functions.
xml.sax.handler Base classes for SAX event handlers.
xml.sax.saxutils Convenience functions and classes for use with SAX.
xml.sax.xmlreader Interface which SAX-compliant XML parsers must implement.
xmllib A parser for XML documents.

See Also:

Python/XML Libraries
Home page for the PyXML package, containing an extension of xml package bundled with Python.


Previous: 12.19 robotparser Up: Python Library Reference Next: 13.1 HTMLParser
Release 2.2.3, documentation updated on 30 May 2003.
See About this document... for information on suggesting changes.

AltStyle によって変換されたページ (->オリジナル) /