4
\$\begingroup\$

Below are two functions that work no problem in my current script. They are written to be ran in Python 2.7.x

def tor_browser_initialise():
 """ This function checks whether the Tor Browser is running. If it isn't,
 it will open the Tor Browser.
 """
 processlist = []
 for p in psutil.process_iter():
 try:
 process = psutil.Process(p.pid)
 pname = process.name()
 processlist.append(pname)
 except:
 continue
 if "tor.exe" not in processlist:
 process = subprocess.Popen(r"C:\Program Files (x86)\Tor Browser\Browser\firefox.exe", stdout=subprocess.PIPE)
 time.sleep(30)
def connect_tor(url):
 """ This function accepts a URl as an argument. It accesses the URL via TOR before
 returning the HTML source code to the function that called it. This function also
 uses random browser information.
 """
 LOCALHOST = "127.0.0.1"
 PORT = 9150
 useragent_list = ['Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0',
 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/29.0',
 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36',
 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36',
 'Mozilla/5.0 (compatible; MSIE 10.6; Windows NT 6.1; Trident/5.0; InfoPath.2; SLCC1; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 2.0.50727) 3gpp-gba UNTRUSTED/1.0',
 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
 'Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14',
 'Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30']
 socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, LOCALHOST, PORT)
 socket.socket = socks.socksocket
 request = urllib2.Request(url)
 request.add_header('User-Agent', random.choice(useragent_list))
 response = urllib2.urlopen(request)
 return response

I'd like to know if there is a more concise and Pythonic way of writing the two functions. I haven't listed the dependant libraries / modules at the beginning of the code, but it does work correctly.

Alex L
5,7832 gold badges26 silver badges69 bronze badges
asked Jan 2, 2016 at 21:21
\$\endgroup\$
1
  • \$\begingroup\$ The Tor Website has a Python library with no dependencies: Stem \$\endgroup\$ Commented Jan 3, 2016 at 2:55

3 Answers 3

5
\$\begingroup\$

Avoid bare except

Writing except: without specifying a precise exception is asking for trouble, as anything will be caught, silencing all possible bugs, instead use: except MyExpectedKindOfException.

Reconsider the very long sleeping

The function tor_browser_initialise ends with time.sleep(30).

That is a lot of time to sleep. Are you 100% sure that any call to that function will want to sleep so much?

Much worse, the sleep is not documented, so the caller will see his program hang on for 30 seconds for no apparent reason!

Just remove the call to time.sleep and let the user decide if and how much he wants to sleep after calling the function.

answered Jan 2, 2016 at 21:50
\$\endgroup\$
3
\$\begingroup\$

A word on indentation

You have mismatched indentation levels in tor_browser_initialise (shouldn't it be "initialize"?): 8 spaces at the beginning and then 4. Choose only one and stick to it. PEP 8 recommend 4 spaces.

It also has some recommendation on aligning continuation lines. You'd better of be using

useragent_list = [
 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 '
 'Firefox/31.0',
 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 '
 'Firefox/29.0',
 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 '
 '(KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36',
 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 '
 '(KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36',
 'Mozilla/5.0 (compatible; MSIE 10.6; Windows NT 6.1; '
 'Trident/5.0; InfoPath.2; SLCC1; .NET CLR 3.0.4506.2152; '
 '.NET CLR 3.5.30729; .NET CLR 2.0.50727) 3gpp-gba UNTRUSTED/1.0',
 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
 'Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14',
 'Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) '
 'AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'
]

Which also uses implicit string literals continuation to keep line length under 80 characters.

Use constants as such

LOCALHOST, PORT, and useragent_list are constants, you even use uppercase for two of them to emphasize it. Why redefine them each time you call connect_tor, then?

You should move them from the function body to the top-level of the file. You may also be interested in turning useragent_list (or USER_AGENTS which I find better) into an immutable collection such as a tuple or a frozenset.

Save on resources and computation

You could improve tor_browser_initialise by returning early if you find the 'tor.exe' process. You could thus get rid of the processlist since exitting the for loop would mean that you didn't return early and thus you didn't find the process you were looking for.

def tor_browser_initialise():
 for p in psutil.process_iter():
 try:
 process = psutil.Process(p.pid)
 if process.name() == 'tor.exe':
 return
 except:
 continue
 subprocess.Popen(
 r"C:\Program Files (x86)\Tor Browser\Browser\firefox.exe",
 stdout=subprocess.PIPE)
 time.sleep(30)
answered Jan 3, 2016 at 1:03
\$\endgroup\$
2
\$\begingroup\$

Right now you've hardcoded the location of the TOR browser - this is not ideal. It also assumes a Windows path. This could be improved by passing it as a parameter

def tor_browser_initialise(tor_path):
 # stuff
 process = subprocess.Popen(tor_path, stdout=subprocess.PIPE)
 # other stuff

If you want to provide default paths you could do so like this. It also allows you to provide default paths depending on the operating system using sys.platform

def tor_browser_initialise(tor_path=None):
 if tor_path is None:
 tor_path = get_default_path()
 # the rest of it
DEFAULT_TOR_PATHS = {
 'win32': r"C:\Program Files (x86)\Tor Browser\Browser\firefox.exe"
}
def get_default_path():
 try:
 return DEFAULT_TOR_PATHS[sys.platform]
 except KeyError:
 raise ValueError(' '.join([
 "There is no default path for Tor on your system,"
 "detected to be {}.".format(sys.platform),
 "You must provide a path"]))

You could also use os.path.join if you don't want to worry about raw strings or escaping backspaces.

answered Jan 4, 2016 at 16:18
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.