6
\$\begingroup\$

It seems that lxml/etree are generally imported as from lxml import etree -- why is that? It keeps the code tidier, and while the potential namespace ambiguity might not be a concern, I don't have any incentive of doing this as it's generally frowned upon.

I know for a script of this size it doesn't matter much, but I'm going to be using these modules for a lot more. I'm also curious about what others have to say.

#!/usr/bin/python
# Stuart Powers http://sente.cc/
import sys
import urllib
import lxml.html
from cStringIO import StringIO
""" This script parses HTML and extracts the div with an id of 'search-results':
 ex: <div id='search-results'>...</div>
$ python script.py "http://www.youtube.com/result?search_query=python+stackoverflow&page=1"
The output, if piped to a file would look like: http://c.sente.cc/E4xR/lxml_results.html
"""
parser = lxml.html.HTMLParser()
filecontents = urllib.urlopen(sys.argv[1]).read()
tree = lxml.etree.parse(StringIO(filecontents), parser)
node = tree.xpath("//div[@id='search-results']")[0]
print lxml.etree.tostring(tree, pretty_print=True)
alecxe
17.5k8 gold badges52 silver badges93 bronze badges
asked Jan 3, 2012 at 22:15
\$\endgroup\$
0

2 Answers 2

3
\$\begingroup\$

You might be confusing from lxml import etree that is a legitimate (even preferred) form of an absolute import with relative imports for intra-package imports that are discouraged: http://www.python.org/dev/peps/pep-0008/ (see "Imports" section)

answered Jan 4, 2012 at 16:54
\$\endgroup\$
1
\$\begingroup\$

In your and most of the cases I had while working with lxml.etree or lxml.html, there was only need for parsing and dumping, which in case of string input and output can be achieved with fromstring() and tostring() functions:

from lxml.html import fromstring, tostring

Which would transform your code to:

import sys
import urllib
from lxml.html import fromstring, tostring
data = urllib.urlopen(sys.argv[1]).read()
tree = fromstring(data)
node = tree.xpath("//div[@id='search-results']")[0]
print(tostring(tree, pretty_print=True))
answered Mar 5, 2017 at 4:47
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.