I had to parse a blogger RSS feed but I didn't have access to any third party modules like feedparser or lxml. I was stuck with the task of writing a library to parse RSS feeds; challenge accepted. I started by writing an RSS class and then an Entry
class. I then realized that my classes only had two methods, one of them being __init__
, so I scrapped the OOP approach and went for a more direct approach. I reduced everything down to one function parse_feed
. parse_feed
takes one positional argument: the URL to the RSS feed.
I'm curious what you think about the way I used type
to create classes on the fly.
#-*-coding:utf8;-*-
#qpy:3
#qpy:console
import urllib.request
from xml.dom import minidom
def parse_feed(url):
# This is what parse_feed returns.
feed = type('Feed', (object,), {})
feed.entries = []
with urllib.request.urlopen(url) as res:
dom = minidom.parseString(res.read().decode('latin-1'))
feed.title = dom.getElementsByTagName('title')[0].firstChild.nodeValue
feed.link = dom.getElementsByTagName('link')[0].getAttribute('href')
feed.published = dom.getElementsByTagName('published')[0].firstChild.nodeValue
for element in dom.getElementsByTagName('entry'):
title = element.getElementsByTagName('title')[0].firstChild.nodeValue
link = element.getElementsByTagName('link')[0].getAttribute('href')
author = element.getElementsByTagName('name')[0].firstChild.nodeValue
published = element.getElementsByTagName('published')[0].firstChild.nodeValue
updated = element.getElementsByTagName('updated')[0].firstChild.nodeValue
_id = element.getElementsByTagName('id')[0].firstChild.nodeValue
category = element.getElementsByTagName('category')
tags = []
for node in category:
tags.append(node.getAttribute('term'))
article = element.getElementsByTagName('content')[0].firstChild.nodeValue
entry_dict = dict(
title=title,
link=link,
author=author,
article=article,
tags=tags,
_id=_id)
feed.entries.append(type('Entry', (feed,), entry_dict))
return feed
# Example use.
feed_url = 'https://rickys-python-notes.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=1000'
feed = parse_feed(feed_url)
print(feed.title)
print(feed.published)
for entry in feed.entries:
print(entry.title)
print(entry.link)
2 Answers 2
Nope, nope, nope.
feed = type('Feed', (object,), {}) feed.entries.append(type('Entry', (feed,), entry_dict))
The entire point of OOP is to have pre-defined classes, as contracts to follow. Since your classes are always the same, you should just define them with the class
keyword. I recommend attrs to make it look nicely.
In a good design, classes are never created on the fly, out of thin air. They’re always defined in code, with a set of attributes that should also never change. (I’m not a fan of Python’s lenient style — Java, for example, makes it hard/impossible to create classes and new attributes at runtime.)
Or alternatively, you could make those regular lists of regular dicts. Not everything needs to be a class.
More complaints:
Entry
should not inherit fromFeed
. They’re two separate, unrelated things.dom = minidom.parseString(res.read().decode('latin-1'))
99% of feeds in the wild are in UTF-8, and you should check the encoding in the
<?xml ?>
declaration.
As pointed out in the previous answer, creating classes on the fly is against the OOP philosophy.
Another problem is with parse_feed()
: it does several things at time. This is against the SRP principle. A function is supposed to achieve one goal, and only that one. This facilitates code reuse and unit testing.
I would suggest creating a class which has 3 functions to implement the 3 main tasks I see parse_feed()
is doing.