Proxy evasion browser in Python

Question 1

This code works. It is designed to retrieve website data directly, parse it, and open it as normal. I will develop it further later. Right now I want to know if there is any way to improve efficiency, such as not using a file etc. I would also like to know if my code could be made more pythonic (this would include PEP standards), or incidences where concatenation would be acceptable to readability. Here is the code:

import urllib.request
from tkinter import *
import webbrowser
import os
proxy_handler = urllib.request.ProxyHandler(proxies=None)
opener = urllib.request.build_opener(proxy_handler)
def navigate(query):
 response = opener.open(query)
 html = response.read()
 return html
def parse(junk):
 ugly = str(junk)[2:-1]
 lines = ugly.split('\\n')
 return lines
while True:
 url = input("Path: ")
 dump = navigate(url)
 content = parse(dump)
 with open('cache.html', 'w') as f:
 for line in content:
 f.write(line)
 webbrowser.open_new_tab(os.path.dirname(os.path.abspath(__file__))+'/cache.html')

Question 2

Your code does seem to follow PEP 8 which is a good thing. Also, the logic is splitted in different small functions which is even better.

If you want to make your code portable, you probably shouldn't hardcode / in your path but use os.sep or even better os.path.join instead.

In order not to call write multiple times, you could use writelines.

Now, from a functional point of view, it might be a good idea to create temporary files with tempfile for instance. If you don't want to use it, it might be better to use a hash of the original url or of the content and use it in the name of the new file.

Question 3

Thank you, I never would have thought of that stuff. Could you explain more to a n00b why using tempfile would be advantageous to my programs functionality?

Question 4

1. Make sure your variable names are both short and to the point, using a parameter called junk is pointless. Also, this could cause conflicts if you ever write other code and import this.

def parse(junk):
 ugly = str(junk)[2:-1]
 lines = ugly.split('\\n')
 return lines

2. urllib is outdated. Unless there is a specific reason you need to use urllib, you should use urllib2. Urllib2 is faster and more simple.

import urllib.request

3. This part looks a bit messy: webbrowser.open_new_tab(os.path.dirname(os.path.abspath(__file__))+'/cache.html') Perhaps separate into multiple lines for readability.

Question 5

Thanks alot, but I get an importerror for urllib2 running python 3.3.2

Question 6

Okay. Maybe ask on StackOverflow about that.

Question 7

urllib is outdated in Python 2, but in Python 3, urllib was removed and urllib2 was renamed to urllib.request. As such, import urllib.request should not be a problem.

SylvainD SylvainD 29.7k1 gold badge49 silver badges93 bronze badges · Accepted Answer · 2014-02-18 16:18:37Z

Your code does seem to follow PEP 8 which is a good thing. Also, the logic is splitted in different small functions which is even better.

If you want to make your code portable, you probably shouldn't hardcode / in your path but use os.sep or even better os.path.join instead.

In order not to call write multiple times, you could use writelines.

Now, from a functional point of view, it might be a good idea to create temporary files with tempfile for instance. If you don't want to use it, it might be better to use a hash of the original url or of the content and use it in the name of the new file.

Thank you, I never would have thought of that stuff. Could you explain more to a n00b why using tempfile would be advantageous to my programs functionality?

Stack Exchange Network

Proxy evasion browser in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Proxy evasion browser in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions