3

I've been working with python for about 2 weeks, and I've been able to import an access db and store the values from the sql select, but one of the fields has html tags embedded in it. I'm trying to use the html_parser, and I made the class def in a separate py file that I import in my main file. When I try to call the routine from my main file, I get errors. Here are the commands in my main py file

from html.parser import HTMLParser
import html_parser
parser = MyHTMLParser()

Here are the commands in the html_parser.py file

 class MyHTMLParser(HTMLParser):
 def handle_starttag(self, tag, attrs):
 print("Encountered a start tag:", tag)
 def handle_endtag(self, tag):
 print("Encountered an end tag :", tag)
 def handle_data(self, data):
 print("Encountered some data :", data)
 global data_str
 data_str = data_str + "!#@@@@@#!" + data
 global data_str
 data_str = ""

Here are the errors that appear when I run python from a command prompt with my main py file

C:\Users\Owner\AppData\Local\Programs\Python\Python39>python py_script8.py Importing MyHTMLParser class Traceback (most recent call last): File "C:\Users\Owner\AppData\Local\Programs\Python\Python39\py_script8.py", line 36, in parser = MyHTMLParser() NameError: name 'MyHTMLParser' is not definedBlockquote

If anyone has any insight, it would be greatly appreciated. (This has been a blast working with python.)

**** solution ***** MM truly helped me! Thanks so much! Here is what I did -

Close to the top of the main py file, this has been added to run the html_parser.

import html_parser
if __name__ == '__main__':

In a function that runs from for loop iterating through the records stored from the sql statement that gets all of the rows from the imported access database

 global r_str
 parser.data_str = ""
 
 parser.feed(r_str)
 #print(parser.data_str)
 r_str = parser.data_str

The html_parser.py contents are this:

from html.parser import HTMLParser
 
class MyHTMLParser(HTMLParser):
 def __init__(self):
 # Superclass initialization.
 super().__init__()
 # Variables are initialized here.
 self.data_str = ""
 def handle_starttag(self, tag, attrs):
 print("Encountered a start tag:", tag)
 def handle_endtag(self, tag):
 print("Encountered an end tag :", tag)
 def handle_data(self, data):
 print("Encountered some data :", data)
 self.data_str += "!#@@@@@#!" + data
 
asked Apr 6, 2021 at 1:31

1 Answer 1

2

Corrected the answer.

Please try this.
The main code is written in task.py.

task.py

import html_parser
if __name__ == '__main__':
 # The initialization of the class of the external file(module name+.py) is "module name.classname()".
 parser = html_parser.MyHTMLParser()
 # Insert the HTML here.
 parser.feed('<html><head><title>Parser Test</title></head>'
 '<body><BLOCKQUOTE>Quoted content</BLOCKQUOTE></body></html>')
 # In this way, you can retrieve the contents stored in the parser class.
 print(parser.data_str)

html_parser.py

from html.parser import HTMLParser
 
class MyHTMLParser(HTMLParser):
 def __init__(self):
 # Superclass initialization.
 super().__init__()
 # Variables are initialized here.
 self.data_str = ""
 def handle_starttag(self, tag, attrs):
 print("Encountered a start tag:", tag)
 def handle_endtag(self, tag):
 print("Encountered an end tag :", tag)
 def handle_data(self, data):
 print("Encountered some data :", data)
 self.data_str += "!#@@@@@#!" + data

The expected output is as follows.

Encountered a start tag: html
Encountered a start tag: head
Encountered a start tag: title
Encountered some data : Parser Test
Encountered an end tag : title
Encountered an end tag : head
Encountered a start tag: body
Encountered a start tag: blockquote
Encountered some data : Quoted content
Encountered an end tag : blockquote
Encountered an end tag : body
Encountered an end tag : html
!#@@@@@#!Parser Test!#@@@@@#!Quoted content
answered Apr 6, 2021 at 2:09
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the help. I did make a bit of progress, but since I'm such a noob, I'm still having some issues. I couldn't find any examples to help me, so this is a followup. Could you maybe just let me know what I need in the sub py file that would be great. I'm trying to work on this for a friend who needs a mail merge type of output for a bibliography listing, but the reference field with rtf has all of the html tags, and I'm just trying to strip those out to get the text. Thanks again.
OK. I know what you want, so I'll revise the answer. Please wait.
Please try this. Do you get a similar error?
I'm glad I could help you!
I've been working more on this routine. The data looks good now. 1 2020 Howard, C. M., Hu, R. and Falconer, J. (2020). Sharing Stories: Reflections of Professors’ Literacy Identities and Beliefs. Networks: An Online Journal for Teacher Research,22(3). #doi.org/10.4148/2470-6353.13177# 2 2014 Marriott, C. E. (2014). Just Wondering: The Beginning of Inquiry. Knowledge Quest, 43(2), 74-76. 3 2018 Richards-Bass, S. (2018). Incorporating Critical Thinking: Teaching Strategies in Diagnostic Imaging Education. Dissertations & Theses @ Northcentral University - ProQuest
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.