4
\$\begingroup\$

This code works. It is designed to retrieve website data directly, parse it, and open it as normal. I will develop it further later. Right now I want to know if there is any way to improve efficiency, such as not using a file etc. I would also like to know if my code could be made more pythonic (this would include PEP standards), or incidences where concatenation would be acceptable to readability. Here is the code:

import urllib.request
from tkinter import *
import webbrowser
import os
proxy_handler = urllib.request.ProxyHandler(proxies=None)
opener = urllib.request.build_opener(proxy_handler)
def navigate(query):
 response = opener.open(query)
 html = response.read()
 return html
def parse(junk):
 ugly = str(junk)[2:-1]
 lines = ugly.split('\\n')
 return lines
while True:
 url = input("Path: ")
 dump = navigate(url)
 content = parse(dump)
 with open('cache.html', 'w') as f:
 for line in content:
 f.write(line)
 webbrowser.open_new_tab(os.path.dirname(os.path.abspath(__file__))+'/cache.html')
palacsint
30.3k9 gold badges82 silver badges157 bronze badges
asked Feb 18, 2014 at 15:40
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

Your code does seem to follow PEP 8 which is a good thing. Also, the logic is splitted in different small functions which is even better.

If you want to make your code portable, you probably shouldn't hardcode / in your path but use os.sep or even better os.path.join instead.

In order not to call write multiple times, you could use writelines.

Now, from a functional point of view, it might be a good idea to create temporary files with tempfile for instance. If you don't want to use it, it might be better to use a hash of the original url or of the content and use it in the name of the new file.

answered Feb 18, 2014 at 16:18
\$\endgroup\$
1
  • \$\begingroup\$ Thank you, I never would have thought of that stuff. Could you explain more to a n00b why using tempfile would be advantageous to my programs functionality? \$\endgroup\$ Commented Feb 18, 2014 at 17:39
2
\$\begingroup\$

1. Make sure your variable names are both short and to the point, using a parameter called junk is pointless. Also, this could cause conflicts if you ever write other code and import this.

def parse(junk):
 ugly = str(junk)[2:-1]
 lines = ugly.split('\\n')
 return lines

2. urllib is outdated. Unless there is a specific reason you need to use urllib, you should use urllib2. Urllib2 is faster and more simple.

import urllib.request

3. This part looks a bit messy: webbrowser.open_new_tab(os.path.dirname(os.path.abspath(__file__))+'/cache.html') Perhaps separate into multiple lines for readability.

answered Feb 18, 2014 at 16:18
\$\endgroup\$
3
  • \$\begingroup\$ Thanks alot, but I get an importerror for urllib2 running python 3.3.2 \$\endgroup\$ Commented Feb 18, 2014 at 17:37
  • \$\begingroup\$ Okay. Maybe ask on StackOverflow about that. \$\endgroup\$ Commented Feb 18, 2014 at 17:38
  • 1
    \$\begingroup\$ urllib is outdated in Python 2, but in Python 3, urllib was removed and urllib2 was renamed to urllib.request. As such, import urllib.request should not be a problem. \$\endgroup\$ Commented Feb 19, 2014 at 1:52

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.