I'm from PPCG so I was making an esolang and I decided to write it in Python. Eventually it went from an esolang to an OEIS (Online Encyclopedia of Integer Sequences) lookup tool. I'm very new to Python.
Essentially this program takes an OEIS sequence number (e.g. 55
for sequence A000055
) and the nth number in the sequence to return.
The code gets the OEIS page, parses it with BeautifulSoup, and returns the result, if it doesn't exist it returns "OEIS does not have a number in the sequence at the given index".
import sys, re
from urllib2 import*
from bs4 import BeautifulSoup
# Main logic
def execute(code, nth):
# Decode input stuff
num = int(code)
try:
f = urlopen("http://oeis.org/A%06d/list" % num)
# global tree, data # Debugging
# >:D I'm sorry but I have to use RegEx to parse HTML
print {key: int(value) for key, value in
re.findall( r'(\d+) (\d+)', re.sub(r'\D+', " ", re.sub(r'<[^>]+>', "",
str(BeautifulSoup(f, "lxml").find(lambda tag:
tag.name == "table" and # A table
not tag.get('cellspacing') == "0" and # A table table
len(tag.contents) > 1 # A two column table
))
)) )
}.get(nth, "OEIS does not have a number in the sequence at the given index")
except HTTPError:
print "Could not find sequence A%06d" % num
except URLError:
print "Could not connect to sources";
except:
print "Verify your numbers are correct"
raise
if __name__ == '__main__':
if len(sys.argv) > 1:
execute(sys.argv[1], sys.argv[2])
else:
print """This is the OEIS lookup tool
You haven't entered the sequence""" % (LANGNAME)
As I'm new to Python. I'm using things I only kind of understand. My main concern is how I laid out this program. Especially the HTML parsing, I really doubt I'm doing that in the best way.
5 Answers 5
import*
:
Please avoid import
ing all modules at all costs. Import each module you use separately
Regex ಠ_ಠ
Regex is evil, the worst. seriously. stop it. right now. kill python.exe right now and change it.
You use beautifulsoup, a dom parsing library, literally a few lines later, but you choose to use regex.
You sir, are evil.
I would suggest looking further at beautifulsoup, or taking a look at Scrapy, a scraping library for Python that makes use of generators to scrape large element sets (and even has cloud support!)
Lisp
Your code reads like lisp.
)) )) )
Seriously, you need to format this better.
Try using Python's style guide, PEP8. when running your code through the online analyser, it threw 20 errors immediately.
When coding good Python, PEP8 should be your goto style document.
Looping
You shouldn't be looping like this:
print {key: int(value) for key, value in
If you want to reduce the code indentation level, then you should use separate variables to store each step in the process, or just simply use a standard loop.
String formatting
Instead of using % (myString)
, you should use string.format
, as it's an improvement for these reasons:
- less independant of parameter order
- More readable
"My name is {name}, and I work for {work_name} as a {job_name}".format(name="Batman", work_name="Gotham City", job_name="Crimestopper")
-
4\$\begingroup\$ What's the difference between string.format and % mystring \$\endgroup\$Downgoat– Downgoat2016年02月16日 01:48:11 +00:00Commented Feb 16, 2016 at 1:48
-
1\$\begingroup\$ Reason parsing OEIS HTML will be a bastard \$\endgroup\$Downgoat– Downgoat2016年02月16日 02:22:53 +00:00Commented Feb 16, 2016 at 2:22
-
8\$\begingroup\$ That's the opposite of what I suggested in my review. \$\endgroup\$Quill– Quill2016年02月16日 04:29:22 +00:00Commented Feb 16, 2016 at 4:29
-
2\$\begingroup\$ @Downgoat
string.format
has many more formatting options and allows you to do more with it than%
formatting.string.format
also coerces types to strings, so it often doesn't need stuff to be explicitly converted to strings. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2016年02月16日 09:33:54 +00:00Commented Feb 16, 2016 at 9:33 -
2\$\begingroup\$ @Downgoat:
<table><tr><td></td></table>
is legal HTML, see w3.org/TR/html5/tabular-data.html#the-tr-element \$\endgroup\$Charles– Charles2016年02月16日 13:53:14 +00:00Commented Feb 16, 2016 at 13:53
You do not need HTML parsing at all. OEIS has a nice JSON output format.
https://oeis.org/search?fmt=json&q=id:A000045
So the core functionality of your program can be written as something like
import sys
import urllib2
import json
f = urllib2.urlopen("https://oeis.org/search?fmt=json&q=id:%s" % sys.argv[1])
doc = json.loads(f.read())
comment = doc['results'][0]['comment']
print "\n".join(comment)
-
2\$\begingroup\$ This can be made nicer - I think - with
requests
:from requests import get
anddoc = get("https://oeis.org/search", params={"fmt": "json", "q": "id:" + code}).json()
. \$\endgroup\$Finn Årup Nielsen– Finn Årup Nielsen2016年02月16日 13:37:43 +00:00Commented Feb 16, 2016 at 13:37
Redundant code
print """This is the OEIS lookup tool
You haven't entered the sequence""" % (LANGNAME)
This can easily become
print "This is the OEIS lookup tool\nYou haven't entered the sequence"
You accidentally left some residual code from Putt in there.
♫ Let's all hop on the code review bus... ♫
Error messages
except HTTPError:
print "Could not find sequence A%06d" % num
except URLError:
print "Could not connect to sources";
except:
print "Verify your numbers are correct"
raise
Error messages should probably go to STDERR, rather than being printed to STDOUT. Doing so lets you filter error messages from actual output, which is useful for debugging and logging.
Documentation and variable naming
Comments like
# Decode input stuff
aren't particularly useful to someone who doesn't know what code
is ("Why would a piece of code be a number?"), and is probably still not useful to someone who does know what code
is. Something like
# Convert sequence number to integer
is a lot less ambiguous.
While we're on this topic, a lot of variable names used here are quite generic. One-off names like key, value
are fine, but code
and num
aren't particularly clear about what they refer to. Instead of num
, how about sequence_number
, seqnum
or, since this is OEIS-related, a_number
?
The execute
method is also not well documented, even though you give a good description of how it works at the top of the post. Consider putting your explanation, or something similar, in as a docstring.
General readability
This
print {key: int(value) for key, value in
re.findall( r'(\d+) (\d+)', re.sub(r'\D+', " ", re.sub(r'<[^>]+>', "",
str(BeautifulSoup(f, "lxml").find(lambda tag:
tag.name == "table" and # A table
not tag.get('cellspacing') == "0" and # A table table
len(tag.contents) > 1 # A two column table
))
)) )
}.get(nth, "OEIS does not have a number in the sequence at the given index")
is doing far too much at once, such that it's hard to tell where to start reading from. Break it up into pieces - if you have a step converting non-digit runs to spaces, put it on its own line, rather than chaining everything together.
Extraneous semicolon
print "Could not connect to sources";
Checking inequality
not tag.get('cellspacing') == "0"
Why not !=
? Or, if cellspacing
is meant to be a number, int(tag.get('cellspacing')) > 0
?
Here are some issue:
- Considering docopt. It forces you to think on script documentation.
- Consider the script also to a be useful as a module in regards to naming and documentation. I suppose that depend on you application.
- Separate
print
from function. - Consider indexed from one or zero. Presently, you are using indexed-from-zero
get(nth, ...)
. I would rather index from one, so the first number is refered to as 1. Does OEIS call the first value the zeroth? request
may be nicer thanurllib2
Here is my attempt (with missing exception handling which should also be considered) using the JSON interface as suggested by @Vortico:
"""OEIS.
Usage:
oeis <code> <nth>
"""
from docopt import docopt
from requests import get
def oeis(code, nth):
"""Return n'th number in OEIS sequence.
Parameters
----------
code : int
OEIS identifier, e.g., 55.
nth : int
Ordinal number in the sequence.
Returns
-------
number : int
Number in the sequence
Examples
--------
>>> oeis(45, 8)
13
"""
# http://codereview.stackexchange.com/a/120133/46261
doc = get("https://oeis.org/search",
params={"fmt": "json",
"q": "id:A{:06d}".format(int(code))}).json()
data = doc['results'][0]['data']
number = int(data.split(',')[int(nth) - 1])
return number
if __name__ == "__main__":
arguments = docopt(__doc__)
print(oeis(arguments['<code>'], arguments['<nth>']))
Explore related questions
See similar questions with these tags.
<center>
cannot hold... \$\endgroup\$