This code works. It is designed to retrieve website data directly, parse it, and open it as normal. I will develop it further later. Right now I want to know if there is any way to improve efficiency, such as not using a file etc. I would also like to know if my code could be made more pythonic (this would include PEP standards), or incidences where concatenation would be acceptable to readability. Here is the code:
import urllib.request
from tkinter import *
import webbrowser
import os
proxy_handler = urllib.request.ProxyHandler(proxies=None)
opener = urllib.request.build_opener(proxy_handler)
def navigate(query):
response = opener.open(query)
html = response.read()
return html
def parse(junk):
ugly = str(junk)[2:-1]
lines = ugly.split('\\n')
return lines
while True:
url = input("Path: ")
dump = navigate(url)
content = parse(dump)
with open('cache.html', 'w') as f:
for line in content:
f.write(line)
webbrowser.open_new_tab(os.path.dirname(os.path.abspath(__file__))+'/cache.html')
2 Answers 2
Your code does seem to follow PEP 8 which is a good thing. Also, the logic is splitted in different small functions which is even better.
If you want to make your code portable, you probably shouldn't hardcode /
in your path but use os.sep or even better os.path.join instead.
In order not to call write
multiple times, you could use writelines.
Now, from a functional point of view, it might be a good idea to create temporary files with tempfile for instance. If you don't want to use it, it might be better to use a hash of the original url or of the content and use it in the name of the new file.
-
\$\begingroup\$ Thank you, I never would have thought of that stuff. Could you explain more to a n00b why using tempfile would be advantageous to my programs functionality? \$\endgroup\$anon582847382– anon5828473822014年02月18日 17:39:32 +00:00Commented Feb 18, 2014 at 17:39
1. Make sure your variable names are both short and to the point, using a parameter called junk is pointless. Also, this could cause conflicts if you ever write other code and import this.
def parse(junk):
ugly = str(junk)[2:-1]
lines = ugly.split('\\n')
return lines
2. urllib is outdated. Unless there is a specific reason you need to use urllib, you should use urllib2. Urllib2 is faster and more simple.
import urllib.request
3. This part looks a bit messy:
webbrowser.open_new_tab(os.path.dirname(os.path.abspath(__file__))+'/cache.html')
Perhaps separate into multiple lines for readability.
-
\$\begingroup\$ Thanks alot, but I get an importerror for urllib2 running python 3.3.2 \$\endgroup\$anon582847382– anon5828473822014年02月18日 17:37:02 +00:00Commented Feb 18, 2014 at 17:37
-
\$\begingroup\$ Okay. Maybe ask on StackOverflow about that. \$\endgroup\$xv435– xv4352014年02月18日 17:38:43 +00:00Commented Feb 18, 2014 at 17:38
-
1\$\begingroup\$
urllib
is outdated in Python 2, but in Python 3,urllib
was removed andurllib2
was renamed tourllib.request
. As such,import urllib.request
should not be a problem. \$\endgroup\$icktoofay– icktoofay2014年02月19日 01:52:30 +00:00Commented Feb 19, 2014 at 1:52