I want to parse an HTML table into a 2d array (rows and cols) in python using HTMLParser (only. Don't want to use BeautifulSoup and other non-standard libraries)
This is for a personal project, doing this for fun :P
Anyway, here's my code. Its giving me a really messed up error - it says
asked Mar 5, 2012 at 10:02
Aniruddh Chaturvedi
6494 gold badges9 silver badges19 bronze badges
1 Answer 1
I haven't checked what you exactly want to do, but you assign a string to self.txt and then try to use it as a list.
In the constructor, you initialize self.txt with an empty list :
def __init__(self):
...
self.txt = []
...
and then in the handle_data method :
def handle_data(self, text):
if (len(self.txt) > 0 ) :
self.txt.append(text + " ") # <-- Here you consider self.txt is a list
if (self.in_table == 1 and self.in_th == 0):
self.txt = text.lstrip() # <-- Here you **assign a string** to self.txt
Sign up to request clarification or add additional context in comments.
1 Comment
Aniruddh Chaturvedi
Could you check what I did though? I'm trying to get done with this today... Basically I'm trying to add the dehtml'ed data to a new list and then joining the list elements to create one big blob of dehtml'ed text.. That's why self.txt is a list
default