This is a simple Python program I wrote. I hope some one could help me optimize this program. If you found any bad habits, please tell me.
#!/usr/bin/python
import urllib2
def GetURLs(srcURL=""):
if srcURL == "":
return -1
Content = urllib2.urlopen(srcURL, None, 6)
oneLine = Content.readline()
while oneLine:
print oneLine
oneLine = Content.readline()
Content.close()
return 0
GetURLs("http://www.baidu.com")
-
1\$\begingroup\$ For starters, dont start your method name with capital letters.Try getURLS() instead of GetURLs same goes for Content (begins with capital letter). In general class names start with capital letters not method or variable names. And I am not sure what are you trying to do with while online: routine. Try to read this book amazon.com/Code-Complete-Practical-Handbook-Construction/dp/… for best practices in programming. \$\endgroup\$specialscope– specialscope2012年03月10日 12:55:58 +00:00Commented Mar 10, 2012 at 12:55
-
\$\begingroup\$ yes, maybe I should follow your advice.Thx \$\endgroup\$thlgood– thlgood2012年03月10日 13:03:21 +00:00Commented Mar 10, 2012 at 13:03
-
7\$\begingroup\$ You should really follow PEP 8: python.org/dev/peps/pep-0008 \$\endgroup\$rubik– rubik2012年03月10日 13:04:15 +00:00Commented Mar 10, 2012 at 13:04
-
2\$\begingroup\$ One piece of 'non-pythonic' code is the guard clause. While often good practice in other languages in python exceptions are light weight and recommended practice. So rather than checking if the url=='' just pass it the urllib2 and deal with the exception using try: except: \$\endgroup\$David Hall– David Hall2012年03月10日 13:11:42 +00:00Commented Mar 10, 2012 at 13:11
-
1\$\begingroup\$ @DavidHall: In all honesty, I wouldn't call guard clauses 'non-pythonic', and furthermore I wouldn't call throwing exceptions as being Pythonic, necessarily. It's more of a matter of preference, one of which has a distinct (but insignificant) performance advantage. Here's the case for guard clauses, although, granted, it does not discuss Python directly. Above all, though, if you're working on an existing project, maintaining consistency with the existing convention should be the number one priority. \$\endgroup\$voithos– voithos2012年03月20日 23:23:41 +00:00Commented Mar 20, 2012 at 23:23
3 Answers 3
You asked for comments on coding style, so here goes:
It's Python style to use CapitalizedNames for classes and lower_case_names for functions, methods and variables.
Functions should have docstrings that describe their arguments and what they do.
Names starting with "get" are normally used for functions that return the value they get. Your function prints the content at the URL, so it would be better to call it something like
print_url
.The value of 6 for the
timeout
argument tourlopen
seems arbitrary. It would be better either to leave this out (the caller can always usesocket.setdefaulttimeout
if they need to set a timeout), or to allow the caller to pass it as a keyword argument.You can pass the
timeout
argument as a keyword argument to avoid having to specifyNone
for thedata
argument.Having a default value for
srcURL
is pointless, especially since it's an invalid value! Omit this.The return value is useless: it's -1 if the argument was an empty string, or 0 if it wasn't. If the caller needs to know this, they can look at the argument they are about to supply. (Or they can catch the
ValueError
thaturlopen
raises if passed an empty string.)The
Content
object does not get closed if thereadline
function raises an error. You can ensure that it gets closed by using Python'stry: ... finally: ...
statement. (Orcontextlib.closing
if you prefer.)
Applying all of these improvements, plus the one suggested by Lev, yields this function:
def print_url(url, timeout=6):
"""
Print the content at the URL `url` (but time out after `timeout` seconds).
"""
content = urllib2.urlopen(url, timeout=timeout)
try:
for line in content:
print line
finally:
content.close()
-
\$\begingroup\$ Can't you just use a
with
-block to handle closing? I.e. never mindcontextlib.closing
; doesn't the object returned byurllib2.urlopen
already implement the context-manager protocol? \$\endgroup\$Karl Knechtel– Karl Knechtel2012年03月10日 14:52:24 +00:00Commented Mar 10, 2012 at 14:52 -
\$\begingroup\$ @Karl: try it and see! \$\endgroup\$Gareth Rees– Gareth Rees2012年03月10日 15:35:52 +00:00Commented Mar 10, 2012 at 15:35
You can shorten your code by optimizing iteration over the lines. urlopen()
returns a file-like object that allows direct iteration:
content = urllib2.urlopen(srcURL, None, 6)
for line in content:
print line
content.close()
This solution is a bit more idiomatic:
def getURLs(srcURL=""):
if not srcURL:
return -1
content = urllib2.urlopen(srcURL, None, 6)
try:
for oneLine in content:
print oneLine
finally:
content.close()
return 0
Notice that by convention, function names start with lower-case. I simplified the test condition at the beginning, and more importantly, the iteration can be written in a simpler way. And as a good practice, resources should be closed in a finally
block.