i have read the html.parser documentation, but I cannot find the anchorlist attribute of HTMLParser class. Python 2.x has that attribute.
I googled for it, but cannot find an answer. In Python 3.x, does the class HTMLParser have it?
-
Where did you see this attribute? Do you have a reference to it?Burhan Khalid– Burhan Khalid2013年08月03日 14:31:38 +00:00Commented Aug 3, 2013 at 14:31
-
@BurhanKhalid: See docs.python.org/2/library/…Martijn Pieters– Martijn Pieters2013年08月03日 14:37:55 +00:00Commented Aug 3, 2013 at 14:37
1 Answer 1
The anchorlist attribute was part of the htmllib.HTMLParser class. The module was deprecated in Python 2.6 and is not present in Python 3.
The html.parser module in Python 3, on the other hand, was called HTMLParser in Python 2. It does not have the anchorlist attribute.
You can emulate the attribute by listening for start tag events, for any a tag add the href attribute (if present) to a list to build the same list:
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self, *args, **kw):
super().__init__(*args, **kw)
self.archorlist = []
def handle_starttag(self, tag, attrs):
if tag == 'a':
attributes = dict(attrs)
if "href" in attributes:
self.anchorlist.append(attributes["href"])
Alternatively, use a friendlier API like BeautifulSoup to gather link anchors instead.
2 Comments
attrs argument on handle_starttag is actually a list rather than a dictionary, so one has to iterate over the list which contains tuples with name, value, see docs.python.org/3/library/…