Stack Exchange API Python library

Question 1

I wrote a package a few years ago to make my interactions with the Stack Exchange API more straightforward. I've maintained it over the years and used it in several scripts I've written that interact with Stack Exchange. Now my goal is to release it to PyPI. Before I do so though, I'd like to ensure it's a clean, well written Python library. Ultimately, it'd be nice if another developer could look at the code and easily add features or fix bugs they encounter.

Due to my development environment, this is written for Python 2.7. However, I believe it is Python 3 compatible, but I do not have such an environment set up to actually test that.

The main library:

SEAPI.py

import requests
from itertools import chain
from time import sleep
try:
 import json
except ImportError:
 import simplejson as json
class SEAPIError(Exception):
 """
 The Exception that is thrown when ever there is an API error.
 This utilizes the values returned by the API and described
 here: http://api.stackexchange.com/docs/types/error
 Parameters
 ----------
 url : string
 The URL that was called and generated an error
 error : int
 The `error_id` returned by the API (should be an int)
 code : string
 The `description` returned by the API and is human friendly
 message : string
 The `error_name` returned by the API
 """
 def __init__(self, url, error, code, message):
 self.url = url
 self.error = error
 self.code = code
 self.message = message
class SEAPI(object):
 def __init__(self, name=None, version="2.2", **kwargs):
 """
 The object used to interact with the Stack Exchange API
 Attributes
 ----------
 name : string
 (Required): A valid `api_site_parameter` 
 (avaiable from http://api.stackexchange.com/docs/sites) which will
 be used to connect to a particular site on the Stack Exchange
 Network.
 version : float
 (Required) The version of the API you are connecting to. The 
 default of 2.2 is the current version
 kwargs : {proxy, max_pages, page_size, key, access_token}
 proxy - A dictionary of http and https proxy locations
 Example: {'http': 'http://example.com', 
 'https': 'https:example.com'}
 By default, this is `None`.
 max_pages - The maximium number of pages to retreive (Default: 100)
 page_size - The number of elements per page. The API limits this to
 a maximum of 100 items on all end points except `site`
 key - An API key 
 access_token - An access token associated with an application and 
 a user, to grant more permissions (such as write access)
 """
 if not name:
 raise ValueError('No Site Name provided')
 self.proxy = None
 self.max_pages = 100
 self.page_size = 100
 self._endpoint = None
 self._api_key = None
 self._name = None
 self._version = version
 self._previous_call = None
 self.key = None
 self.access_token = None
 if 'proxy' in kwargs:
 self.proxy = kwargs['proxy']
 if 'max_pages' in kwargs:
 self.max_pages = kwargs['max_pages']
 if 'page_size' in kwargs:
 self.page_size = kwargs['page_size']
 if 'key' in kwargs:
 self.key = kwargs['key']
 if 'access_token' in kwargs:
 self.access_token = kwargs['access_token']
 self._base_url = 'https://api.stackexchange.com/{}/'.format(version)
 sites = self.fetch('sites', filter='!*L1*AY-85YllAr2)')
 for s in sites['items']:
 if name == s['api_site_parameter']:
 self._name = s['name']
 self._api_key = s['api_site_parameter']
 self._version = version
 if not self._name:
 raise ValueError('Invalid Site Name provided')
 def __repr__(self):
 return "<{}> v:<{}> endpoint: {} Last URL: {}".format(self._name, 
 self._version, 
 self._endpoint, 
 self._previous_call)
 def fetch(self, endpoint=None, page=1, key=None, filter='default', **kwargs):
 """Returns the results of an API call.
 This is the main work horse of the class. It builds the API query 
 string and sends the request to Stack Exchange. If there are multiple 
 pages of results, and we've configured `max_pages` to be greater than 
 1, it will automatically paginate through the results and return a 
 single object.
 Returned data will appear in the `items` key of the resulting 
 dictionary.
 Parameters
 ----------
 endpoint : string
 The API end point being called. Available endpoints are listed on 
 the official API documentation: http://api.stackexchange.com/docs
 This can be as simple as `fetch('answers')`, to call the answers 
 end point
 If calling an end point that takes additional parameter, such as `id`s
 pass the ids as a list to the `ids` key: 
 `fetch('answers/{}', ids=[1,2,3])`
 This will attempt to retrieve the answers for the three listed ids.
 If no end point is passed, a `ValueError` will be raised
 page : int
 The page in the results to start at. By default, it will start on
 the first page and automatically paginate until the result set
 reached `max_pages`.
 key : string
 An API key
 filter : string
 The filter to utilize when calling an endpoint. Different filters
 will return different keys. The default is `default` and this will
 still vary depending on what the API returns as default for a 
 particular endpoint
 kwargs :
 Parameters accepted by individual endpoints. These parameters 
 *must* be named the same as described in the endpoint documentation
 Returns
 -------
 result : dictionary
 A dictionary containing wrapper data regarding the API call
 and the results of the call in the `items` key. If multiple
 pages were retreived, all of the results will appear in the
 `items` tag.
 """
 if not endpoint:
 raise ValueError('No end point provided.')
 self._endpoint = endpoint
 params = {
 "pagesize": self.page_size,
 "page": page,
 "filter": filter
 }
 if self.key:
 params['key'] = self.key
 if self.access_token:
 params['access_token'] = self.access_token
 if 'ids' in kwargs:
 ids = ';'.join(str(x) for x in kwargs['ids'])
 kwargs.pop('ids', None)
 else:
 ids = None
 params.update(kwargs)
 if self._api_key:
 params['site'] = self._api_key
 data = []
 run_cnt = 0
 backoff = 0
 total = 0
 while True:
 run_cnt += 1
 if run_cnt > self.max_pages: # Prevents Infinate Loops
 break
 base_url = "{}{}/".format(self._base_url, endpoint)
 if ids:
 base_url += "{}".format(ids)
 try:
 response = requests.get(base_url, params=params, proxies=self.proxy)
 except requests.exceptions.ConnectionError as e:
 raise SEAPIError(self._previous_call, str(e), str(e), str(e))
 self._previous_call = response.url
 try:
 response.encoding = 'utf-8-sig'
 response = response.json()
 except ValueError as e:
 raise SEAPIError(self._previous_call, str(e), str(e), str(e))
 try:
 error = response["error_id"]
 code = response["error_name"]
 message = response["error_message"]
 raise SEAPIError(self._previous_call, error, code, message)
 except KeyError:
 pass # This means there is no error
 if key:
 data.append(response[key])
 else:
 data.append(response)
 if len(data) < 1:
 break
 backoff = 0
 total = 0
 page = 1
 if 'backoff' in response:
 backoff = int(response['backoff'])
 sleep(backoff+1) # Sleep an extra second to ensure no timing issues
 if 'total' in response:
 total = response['total']
 if 'has_more' in response and response['has_more']:
 params["page"] += 1
 else:
 break
 r = []
 for d in data:
 r.extend(d['items'])
 result = {'backoff': backoff,
 'has_more': data[-1]['has_more'],
 'page': params['page'],
 'quota_max': data[-1]['quota_max'],
 'quota_remaining': data[-1]['quota_remaining'],
 'total': total,
 'items': list(chain(r))}
 return result
 def send_data(self, endpoint=None, page=1, key=None, filter='default', **kwargs):
 """Sends data to the API.
 This call is similar to `fetch`, but *sends* data to the API instead 
 of retrieving it. 
 Returned data will appear in the `items` key of the resulting 
 dictionary.
 Sending data requires that the `access_token` is set. This is enforced
 on the API side, not within this library.
 Parameters
 ----------
 endpoint : string
 The API end point being called. Available endpoints are listed on 
 the official API documentation: http://api.stackexchange.com/docs
 This can be as simple as `fetch('answers')`, to call the answers 
 end point
 If calling an end point that takes additional parameter, such as `id`s
 pass the ids as a list to the `ids` key: 
 `fetch('answers/{}', ids=[1,2,3])`
 This will attempt to retrieve the answers for the three listed ids.
 If no end point is passed, a `ValueError` will be raised
 page : int
 The page in the results to start at. By default, it will start on
 the first page and automatically paginate until the result set
 reached `max_pages`.
 key : string
 An API key
 filter : string
 The filter to utilize when calling an endpoint. Different filters
 will return different keys. The default is `default` and this will
 still vary depending on what the API returns as default for a 
 particular endpoint
 kwargs :
 Parameters accepted by individual endpoints. These parameters 
 *must* be named the same as described in the endpoint documentation
 Returns
 -------
 result : dictionary
 A dictionary containing wrapper data regarding the API call
 and the results of the call in the `items` key. If multiple
 pages were retreived, all of the results will appear in the
 `items` tag.
 """
 if not endpoint:
 raise ValueError('No end point provided.')
 self._endpoint = endpoint
 params = {
 "pagesize": self.page_size,
 "page": page,
 "filter": filter
 }
 if self.key:
 params['key'] = self.key
 if self.access_token:
 params['access_token'] = self.access_token
 if 'ids' in kwargs:
 ids = ';'.join(str(x) for x in kwargs['ids'])
 kwargs.pop('ids', None)
 else:
 ids = None
 params.update(kwargs)
 if self._api_key:
 params['site'] = self._api_key
 data = []
 base_url = "{}{}/".format(self._base_url, endpoint)
 response = requests.post(base_url, data=params, proxies=self.proxy)
 self._previous_call = response.url
 response = response.json()
 try:
 error = response["error_id"]
 code = response["error_name"]
 message = response["error_message"]
 raise SEAPIError(self._previous_call, error, code, message)
 except KeyError:
 pass # This means there is no error
 data.append(response)
 r = []
 for d in data:
 r.extend(d['items'])
 result = {'has_more': data[-1]['has_more'],
 'page': params['page'],
 'quota_max': data[-1]['quota_max'],
 'quota_remaining': data[-1]['quota_remaining'],
 'items': list(chain(r))}
 return result

The library can then be used by doing something like this:

from SEAPI import SEAPI
SITE = SEAPI('stackoverflow')
comments = SITE.fetch('comments')

This would return a short list of the most recent comments from Stack Overflow. If you wanted to return more than the default 10 items, you can do this before the comments = line:

SITE.page_size = 100
SITE.max_pages = 2

This would paginate through 2 pages of results, returning a maximum of 200 comments (assuming that there are at least 200 comments on Stack Overflow).

Are there improvements that I should make to this code before I release it on PyPI?

Question 2

Are you aware of py-stackexchange ? How will yours be better than that one? In any case, I recommend including the link to your GitHub repo.

Question 3

I was not aware of that, however, looking through the repository, the one I've written seems to be a much thinner wrapper. It's the single file I have listed vs. the several I see here. It's also only 360ish lines. Mine does require that you know the end points where as py-stackexchange seems to have things like .questions instead of my SITE.fetch('questions')

Question 4

If you're going to release the code, do everyone a favour and test it on Python 3.

Question 5

This is a very long __init__ function. Before looking any closer, I'd guess that some of this can be either abstracted away or reduced. A large function generally is bad, but particularly init should be bare bones to just do what's necessary to make an instance usable. If it does too much, then it's a red flag.

First off, you're setting default values, then checking if they're in the kwargs and then setting them again if they are. This is confusing and triples the lines you should have. Ternaries get a bad rap, but this is the perfect use case for them:

self.proxy = kwargs['proxy'] if kwargs.get('proxy') else None
self.max_pages = kwargs['max_pages'] if kwargs.get('max_pages') else 100

Though as @ferada and @jwodder point out, dict.get can return a default value. If you give it a second parameter it will return that if the key is not found, so even simpler is to do this:

self.proxy = kwargs.get('proxy')
self.max_pages = kwargs.get('max_pages', 100)

By default get returns None, so you can just use it plainly for proxy. With max_pages, passing 100 as the second parameter means that if the key isn't found, self.max_pages will be set to 100.

Now for the section where you try get the proper name and key. You set self._version again, for no particular reason as it has not changed. You also don't call break when finding the correct site, even though you only need to match one result and then __init__ is done.

I would also make your strings here constants, the filter and the base url you format with:

class SEAPI(object):
 BASE_URL = 'https://api.stackexchange.com/{}/'.format(version)
 FILTER = '!*L1*AY-85YllAr2)'

I think this pattern of doing too much extends to your other much more complicated functions. Ideally every function would have one job that it does in (relative) isolation. This makes code more readable, extendable and testable. For example fetch could have a lot of the opening chunk moved into a separate function called get_params that just returns a dictionary of parameters and use fetch just to fetch the actual results. It's even easier since you're using a class, that gets around a lot of the scope concerns that you can often have with adding functions.

This while pattern is bizarre:

 while True:
 run_cnt += 1
 if run_cnt > self.max_pages: # Prevents Infinate Loops
 break

Don't you achieve the same with:

 while run_cnt <= self.max_pages:

and then adding to run_cnt at the end of the loop? Or just using for _ in range(self.max_pages)? I might be missing something, but if I am it should be clearly commented.

Question 6

Instead of the ternaries dict.get also has a second parameter for the default.

Question 7

@ferada: ...which defaults to None, so you could just write self.proxy = kwargs.get('proxy').

Question 8

Excellent points, I feel foolish for not saying that. Updated now.

SuperBiasedMan 13.5k5 gold badges37 silver badges62 bronze badges · Answer 1 · 2016-02-17 15:59:51Z

This is a very long __init__ function. Before looking any closer, I'd guess that some of this can be either abstracted away or reduced. A large function generally is bad, but particularly init should be bare bones to just do what's necessary to make an instance usable. If it does too much, then it's a red flag.

First off, you're setting default values, then checking if they're in the kwargs and then setting them again if they are. This is confusing and triples the lines you should have. Ternaries get a bad rap, but this is the perfect use case for them:

self.proxy = kwargs['proxy'] if kwargs.get('proxy') else None
self.max_pages = kwargs['max_pages'] if kwargs.get('max_pages') else 100

Though as @ferada and @jwodder point out, dict.get can return a default value. If you give it a second parameter it will return that if the key is not found, so even simpler is to do this:

self.proxy = kwargs.get('proxy')
self.max_pages = kwargs.get('max_pages', 100)

By default get returns None, so you can just use it plainly for proxy. With max_pages, passing 100 as the second parameter means that if the key isn't found, self.max_pages will be set to 100.

Now for the section where you try get the proper name and key. You set self._version again, for no particular reason as it has not changed. You also don't call break when finding the correct site, even though you only need to match one result and then __init__ is done.

I would also make your strings here constants, the filter and the base url you format with:

class SEAPI(object):
 BASE_URL = 'https://api.stackexchange.com/{}/'.format(version)
 FILTER = '!*L1*AY-85YllAr2)'

I think this pattern of doing too much extends to your other much more complicated functions. Ideally every function would have one job that it does in (relative) isolation. This makes code more readable, extendable and testable. For example fetch could have a lot of the opening chunk moved into a separate function called get_params that just returns a dictionary of parameters and use fetch just to fetch the actual results. It's even easier since you're using a class, that gets around a lot of the scope concerns that you can often have with adding functions.

This while pattern is bizarre:

 while True:
 run_cnt += 1
 if run_cnt > self.max_pages: # Prevents Infinate Loops
 break

Don't you achieve the same with:

 while run_cnt <= self.max_pages:

and then adding to run_cnt at the end of the loop? Or just using for _ in range(self.max_pages)? I might be missing something, but if I am it should be clearly commented.

Instead of the ternaries dict.get also has a second parameter for the default.
@ferada: ...which defaults to None, so you could just write self.proxy = kwargs.get('proxy').
Excellent points, I feel foolish for not saying that. Updated now.

Stack Exchange Network

Stack Exchange API Python library

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Stack Exchange API Python library

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions