4
\$\begingroup\$

Hacker News recently released an official API. Unfortunately, it cannot be used to get the newest posts submitted. I wrote a script to provide a faux streaming-api like method (like praw reddit's submission_stream) so that I can simply put it in a loop and do something like make a gtk-notification when title contains the string 'python'.

"""
Usage:
from hn import HNStream
h = HNStream()
for item in h.stream():
 print "Title: ", item.title
 print "By user: ", item.by
"""
from lxml import html as lh
import re
import requests
import threading
import time
import ujson as json
__all__ = ['HNStream']
class Item:
 def __init__(self, **kwargs):
 self.__dict__.update(kwargs)
 def __str__(self):
 return "{}: {}".format(self.type, self.title)
 __repr__ = __str__
class HackerNews:
 def __init__(self):
 self.base_url = 'https://hacker-news.firebaseio.com/v0/{api}'
 def get(self, uri):
 response = requests.get(uri)
 if response.status_code == requests.codes.ok:
 return json.loads(response.text)
 else:
 raise Exception('HTTP Error {}'.format(response.status_code))
 def item(self, item_id):
 uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})
 return Item(**self.get(uri))
class SimpleFIFO:
 def __init__(self, length):
 self.length = length
 self.values = length * [None]
 def contains(self, iid):
 return iid in self.values
 def append(self, value):
 assert isinstance(value, str), "Only append strings"
 self.values.append(value)
 while len(self.values) > self.length:
 self.values.pop(0)
 assert len(self.values) == self.length
 def __str__(self):
 return str(self.values)
class HNStream(threading.Thread):
 def __init__(self):
 threading.Thread.__init__(self)
 def stream(self):
 itembuffer = SimpleFIFO(length=30)
 while True:
 r = requests.get('https://news.ycombinator.com/newest')
 tree = lh.fromstring(r.text)
 # http://stackoverflow.com/a/2756994
 links = tree.xpath("//a[re:match(@href, 'item\?id=\d+')]/@href",
 namespaces={"re": "http://exslt.org/regular-expressions"})
 links = set(links)
 for link in links:
 iid = re.match(r'item\?id=(\d+)', str(link)).groups()[0]
 if not itembuffer.contains(iid):
 itembuffer.append(iid)
 yield HackerNews().item(iid)
 time.sleep(30)
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Nov 20, 2014 at 13:02
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

I have a number of suggestions:

Don't use self.dict

def __init__(self, **kwargs):
 self.__dict__.update(kwargs)

It is pretty unclear which attributes the class will have after calling the constructor, as it is basically up to the caller to decide this. This is not optimal, as you want to be in control of what data is set in an item. I'd suggest to use named arguments instead:

def __init__(self, id, type, title): 
 self.type = type
 self.title = title
 self.id = id
 # Add other attributes that I don't know of?

While this may seem like blowing up the code, it makes it a lot easier to understand for an uninformed reader, and therefor should be preferred IMHO.

Append instead of format

Since you are basically just appending something to self.base_url, you might as well append using + instead of using format. I find it a lot easier to read, but that depends a lot on what you prefer. Consequently:

self.base_url = 'https://hacker-news.firebaseio.com/v0/'

Then

uri = self.base_url.format(**{'api': 'item/{}.json'.format(item_id)})

would look like this if using append:

uri = self.base_url + "item/{}.json".format(item_id)

HackerNews.get()

The get() method's name does not explain what the method does. I'd suggest something offering a bit more documentation, for example: download_item() or download_entry().

On a little side note: You can remove the else: statement, as the raise Exception(..) command will only be executed if the if statement evaluates to False. This depends a bit on personal taste though. I'd also change the ordering like this:

if response.status_code != requests.codes.ok:
 raise Exception('HTTP Error {}'.format(response.status_code))
return json.loads(response.text)

I'd prefer this ordering as it has the return statement at the end, which is where I'd look first when searching for it.

SimpleFIFO

Maybe you can use Python's Queue class? See here for documentation: https://docs.python.org/2/library/queue.html

It also provides a max size, the only thing missing is the check for adding only strings.

HNStream.init()

You can use the super keyword (see also: https://stackoverflow.com/questions/576169/understanding-python-super-and-init-methods) instead of explicitly calling the constructor like this:

def __init__(self):
 super(HNSTream, self).__init__()

Or in Python 3:

def __init__(self):
 super().__init__()

HNStream.stream()

Just a few naming suggestions:

  • r -> result
  • iid -> item_id

That was it, hope it helps!

answered Nov 21, 2014 at 21:00
\$\endgroup\$
1
  • \$\begingroup\$ Thanks a lot for the detailed review. I had fixed a few of these issues (collections.deque instead of SimpleFIFO, getting rid of dict.update) you can see the updated code here. I really like your explanation for having a return statement at the end, and semantic functions so I'll change the get method appropriately. Re: super I want it to work with 2 and 3 so super(HNSTream, self).__init__() seems to be a better choice. \$\endgroup\$ Commented Nov 23, 2014 at 7:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.