Using Python - Get a table out of some html and display it?

Question 1

There's a lot of help on here but some of it goes over my head, so hopefully by asking my question and getting a tailored answer I will better understand.

So far I have managed to connect to a website, authenticate as a user, fill in a form and then pull down the html. The html contains a table I want. I just want to say some thing like:-

read html... when you read table start tags keep going until you reach table end tags and then disply that, or write it to a new html file and open it keeping the tags so it's formmated for me.

Here is the code I have so far.

# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
s.post(LOGINURL, data=login)
# print
r = s.get(LOGINURL)
print r.url
# An authorised request.
r = s.get(APURL)
print r.url
 # etc...
s.post(APURL)
#
r = s.post(APURL, data=findaps)
r = s.get(APURL)
#print r.text
f = open("makethisfile.html", "w")
f.write('\n'.join(['<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">',
 '<html>',
 ' <head>',
 ' <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">',
 ' <title>THE TITLE</title>',
 ' <link rel="stylesheet" href="css/displayEventLists.css" type="text/css">',
 r.text #this just does everything, i need to get the table.
 ])
 )
f.close()

Question 2

You should use at least HTMLParser docs.python.org/2/library/markup.html or maybe even something more powerful

Question 3

Take a look also at beautifulsoup stackoverflow.com/questions/17196018/…

Question 4

Although it's best to parse the file properly, a quick-and-dirty method uses a regex.

m = re.search("<table.*?>(.+)</table>", r.text, re.S)
if (m):
 print m.group()
else:
 print "Error: table not found"

As an example of why parsing is better, the regex as written will fail with the following (rather contrived!) example:

<!-- <table> -->
blah
blah
<table>
this is the actual
table
</table>

And as written it will get the first table in the file. But you could just loop to get the 2nd, etc., (or make the regex specific to the table you want if possible) so that's not a problem.

Question 5

That did it perfectly first time, thanks for the help. Does anyone mind explaining to me why this isn't the best method? Could I run into issues, say if there is more than 1 table, is that the issue?

Question 6

I see ok, cool. Thanks for your help!! Luckily there is only 1 table on my page and no comments or anything so its working perfectly. Loving Python!

ooga 15.6k2 gold badges23 silver badges23 bronze badges · Accepted Answer · 2014-04-01 14:53:10Z

Although it's best to parse the file properly, a quick-and-dirty method uses a regex.

m = re.search("<table.*?>(.+)</table>", r.text, re.S)
if (m):
 print m.group()
else:
 print "Error: table not found"

As an example of why parsing is better, the regex as written will fail with the following (rather contrived!) example:

<!-- <table> -->
blah
blah
<table>
this is the actual
table
</table>

And as written it will get the first table in the file. But you could just loop to get the 2nd, etc., (or make the regex specific to the table you want if possible) so that's not a problem.

That did it perfectly first time, thanks for the help. Does anyone mind explaining to me why this isn't the best method? Could I run into issues, say if there is more than 1 table, is that the issue?
I see ok, cool. Thanks for your help!! Luckily there is only 1 table on my page and no comments or anything so its working perfectly. Loving Python!

CollectivesTM on Stack Overflow

Using Python - Get a table out of some html and display it?

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related