An OEIS Lookup tool in Python

Question 1

I'm from PPCG so I was making an esolang and I decided to write it in Python. Eventually it went from an esolang to an OEIS (Online Encyclopedia of Integer Sequences) lookup tool. I'm very new to Python.

Essentially this program takes an OEIS sequence number (e.g. 55 for sequence A000055) and the nth number in the sequence to return.

The code gets the OEIS page, parses it with BeautifulSoup, and returns the result, if it doesn't exist it returns "OEIS does not have a number in the sequence at the given index".

import sys, re
from urllib2 import*
from bs4 import BeautifulSoup
# Main logic
def execute(code, nth):
 # Decode input stuff
 num = int(code)
 try:
 f = urlopen("http://oeis.org/A%06d/list" % num)
 # global tree, data # Debugging
 # >:D I'm sorry but I have to use RegEx to parse HTML
 print {key: int(value) for key, value in
 re.findall( r'(\d+) (\d+)', re.sub(r'\D+', " ", re.sub(r'<[^>]+>', "",
 str(BeautifulSoup(f, "lxml").find(lambda tag:
 tag.name == "table" and # A table
 not tag.get('cellspacing') == "0" and # A table table
 len(tag.contents) > 1 # A two column table
 ))
 )) )
 }.get(nth, "OEIS does not have a number in the sequence at the given index")
 except HTTPError:
 print "Could not find sequence A%06d" % num
 except URLError:
 print "Could not connect to sources";
 except:
 print "Verify your numbers are correct"
 raise
if __name__ == '__main__':
 if len(sys.argv) > 1:
 execute(sys.argv[1], sys.argv[2])
 else:
 print """This is the OEIS lookup tool
You haven't entered the sequence""" % (LANGNAME)

As I'm new to Python. I'm using things I only kind of understand. My main concern is how I laid out this program. Especially the HTML parsing, I really doubt I'm doing that in the best way.

Question 2

The <center> cannot hold...

Question 3

">:D I'm sorry but I have to use RegEx to parse HTML" - Related.

Question 4

`import*`:

Please avoid importing all modules at all costs. Import each module you use separately

Regex ಠ_ಠ

Regex is evil, the worst. seriously. stop it. right now. kill python.exe right now and change it.

You use beautifulsoup, a dom parsing library, literally a few lines later, but you choose to use regex.

You sir, are evil.

I would suggest looking further at beautifulsoup, or taking a look at Scrapy, a scraping library for Python that makes use of generators to scrape large element sets (and even has cloud support!)

Lisp

Your code reads like lisp.

 ))
 )) )

Seriously, you need to format this better.

Try using Python's style guide, PEP8. when running your code through the online analyser, it threw 20 errors immediately.

When coding good Python, PEP8 should be your goto style document.

Looping

You shouldn't be looping like this:

print {key: int(value) for key, value in

If you want to reduce the code indentation level, then you should use separate variables to store each step in the process, or just simply use a standard loop.

String formatting

Instead of using % (myString), you should use string.format, as it's an improvement for these reasons:

less independant of parameter order
More readable

"My name is {name}, and I work for {work_name} as a {job_name}".format(name="Batman", work_name="Gotham City", job_name="Crimestopper")

Question 5

What's the difference between string.format and % mystring

Question 6

Reason parsing OEIS HTML will be a bastard

Question 7

That's the opposite of what I suggested in my review.

Question 8

@Downgoat string.format has many more formatting options and allows you to do more with it than % formatting. string.format also coerces types to strings, so it often doesn't need stuff to be explicitly converted to strings.

Question 9

@Downgoat: <table><tr><td></td></table> is legal HTML, see w3.org/TR/html5/tabular-data.html#the-tr-element

Question 10

You do not need HTML parsing at all. OEIS has a nice JSON output format.

https://oeis.org/search?fmt=json&q=id:A000045

So the core functionality of your program can be written as something like

import sys
import urllib2
import json
f = urllib2.urlopen("https://oeis.org/search?fmt=json&q=id:%s" % sys.argv[1])
doc = json.loads(f.read())
comment = doc['results'][0]['comment']
print "\n".join(comment)

Question 11

This can be made nicer - I think - with requests: from requests import get and doc = get("https://oeis.org/search", params={"fmt": "json", "q": "id:" + code}).json().

Question 12

Redundant code

 print """This is the OEIS lookup tool
You haven't entered the sequence""" % (LANGNAME)

This can easily become

 print "This is the OEIS lookup tool\nYou haven't entered the sequence"

You accidentally left some residual code from Putt in there.

_{_{♫ Let's all hop on the code review bus... ♫}}

Question 13

Error messages

except HTTPError:
 print "Could not find sequence A%06d" % num
except URLError:
 print "Could not connect to sources";
except:
 print "Verify your numbers are correct"
 raise

Error messages should probably go to STDERR, rather than being printed to STDOUT. Doing so lets you filter error messages from actual output, which is useful for debugging and logging.

Documentation and variable naming

Comments like

# Decode input stuff

aren't particularly useful to someone who doesn't know what code is ("Why would a piece of code be a number?"), and is probably still not useful to someone who does know what code is. Something like

# Convert sequence number to integer

is a lot less ambiguous.

While we're on this topic, a lot of variable names used here are quite generic. One-off names like key, value are fine, but code and num aren't particularly clear about what they refer to. Instead of num, how about sequence_number, seqnum or, since this is OEIS-related, a_number?

The execute method is also not well documented, even though you give a good description of how it works at the top of the post. Consider putting your explanation, or something similar, in as a docstring.

General readability

This

print {key: int(value) for key, value in
 re.findall( r'(\d+) (\d+)', re.sub(r'\D+', " ", re.sub(r'<[^>]+>', "",
 str(BeautifulSoup(f, "lxml").find(lambda tag:
 tag.name == "table" and # A table
 not tag.get('cellspacing') == "0" and # A table table
 len(tag.contents) > 1 # A two column table
 ))
 )) )
}.get(nth, "OEIS does not have a number in the sequence at the given index")

is doing far too much at once, such that it's hard to tell where to start reading from. Break it up into pieces - if you have a step converting non-digit runs to spaces, put it on its own line, rather than chaining everything together.

Extraneous semicolon

print "Could not connect to sources";

Checking inequality

not tag.get('cellspacing') == "0"

Why not !=? Or, if cellspacing is meant to be a number, int(tag.get('cellspacing')) > 0?

Question 14

Here are some issue:

Considering docopt. It forces you to think on script documentation.
Consider the script also to a be useful as a module in regards to naming and documentation. I suppose that depend on you application.
Separate print from function.
Consider indexed from one or zero. Presently, you are using indexed-from-zero get(nth, ...). I would rather index from one, so the first number is refered to as 1. Does OEIS call the first value the zeroth?
request may be nicer than urllib2

Here is my attempt (with missing exception handling which should also be considered) using the JSON interface as suggested by @Vortico:

"""OEIS.
Usage:
 oeis <code> <nth>
"""
from docopt import docopt
from requests import get
def oeis(code, nth):
 """Return n'th number in OEIS sequence.
 Parameters
 ----------
 code : int
 OEIS identifier, e.g., 55.
 nth : int
 Ordinal number in the sequence.
 Returns
 -------
 number : int
 Number in the sequence
 Examples
 --------
 >>> oeis(45, 8)
 13
 """
 # http://codereview.stackexchange.com/a/120133/46261
 doc = get("https://oeis.org/search",
 params={"fmt": "json", 
 "q": "id:A{:06d}".format(int(code))}).json()
 data = doc['results'][0]['data']
 number = int(data.split(',')[int(nth) - 1])
 return number
if __name__ == "__main__":
 arguments = docopt(__doc__)
 print(oeis(arguments['<code>'], arguments['<nth>']))

Quill Quill 12k5 gold badges41 silver badges93 bronze badges · Accepted Answer · 2016-02-16 01:46:37Z

`import*`:

Please avoid importing all modules at all costs. Import each module you use separately

Regex ಠ_ಠ

Regex is evil, the worst. seriously. stop it. right now. kill python.exe right now and change it.

You use beautifulsoup, a dom parsing library, literally a few lines later, but you choose to use regex.

You sir, are evil.

I would suggest looking further at beautifulsoup, or taking a look at Scrapy, a scraping library for Python that makes use of generators to scrape large element sets (and even has cloud support!)

Lisp

Your code reads like lisp.

 ))
 )) )

Seriously, you need to format this better.

Try using Python's style guide, PEP8. when running your code through the online analyser, it threw 20 errors immediately.

When coding good Python, PEP8 should be your goto style document.

Looping

You shouldn't be looping like this:

print {key: int(value) for key, value in

If you want to reduce the code indentation level, then you should use separate variables to store each step in the process, or just simply use a standard loop.

String formatting

Instead of using % (myString), you should use string.format, as it's an improvement for these reasons:

less independant of parameter order
More readable

"My name is {name}, and I work for {work_name} as a {job_name}".format(name="Batman", work_name="Gotham City", job_name="Crimestopper")

@Downgoat string.format has many more formatting options and allows you to do more with it than % formatting. string.format also coerces types to strings, so it often doesn't need stuff to be explicitly converted to strings.
@Downgoat: <table><tr><td></td></table> is legal HTML, see w3.org/TR/html5/tabular-data.html#the-tr-element

Stack Exchange Network

An OEIS Lookup tool in Python

5 Answers 5

`import*`:

Regex ಠ_ಠ

Lisp

Looping

String formatting

Redundant code

Error messages

Documentation and variable naming

General readability

Extraneous semicolon

Checking inequality

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

An OEIS Lookup tool in Python

5 Answers 5

import*:

Regex ಠ_ಠ

Lisp

Looping

String formatting

Redundant code

Error messages

Documentation and variable naming

General readability

Extraneous semicolon

Checking inequality

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`import*`: