3
\$\begingroup\$

I had to parse a blogger RSS feed but I didn't have access to any third party modules like feedparser or lxml. I was stuck with the task of writing a library to parse RSS feeds; challenge accepted. I started by writing an RSS class and then an Entry class. I then realized that my classes only had two methods, one of them being __init__, so I scrapped the OOP approach and went for a more direct approach. I reduced everything down to one function parse_feed. parse_feed takes one positional argument: the URL to the RSS feed.

I'm curious what you think about the way I used type to create classes on the fly.

#-*-coding:utf8;-*-
#qpy:3
#qpy:console
import urllib.request
from xml.dom import minidom
def parse_feed(url):
 # This is what parse_feed returns.
 feed = type('Feed', (object,), {})
 feed.entries = []
 with urllib.request.urlopen(url) as res:
 dom = minidom.parseString(res.read().decode('latin-1'))
 feed.title = dom.getElementsByTagName('title')[0].firstChild.nodeValue
 feed.link = dom.getElementsByTagName('link')[0].getAttribute('href')
 feed.published = dom.getElementsByTagName('published')[0].firstChild.nodeValue
 for element in dom.getElementsByTagName('entry'):
 title = element.getElementsByTagName('title')[0].firstChild.nodeValue
 link = element.getElementsByTagName('link')[0].getAttribute('href')
 author = element.getElementsByTagName('name')[0].firstChild.nodeValue
 published = element.getElementsByTagName('published')[0].firstChild.nodeValue
 updated = element.getElementsByTagName('updated')[0].firstChild.nodeValue
 _id = element.getElementsByTagName('id')[0].firstChild.nodeValue
 category = element.getElementsByTagName('category')
 tags = []
 for node in category:
 tags.append(node.getAttribute('term'))
 article = element.getElementsByTagName('content')[0].firstChild.nodeValue
 entry_dict = dict(
 title=title, 
 link=link, 
 author=author, 
 article=article,
 tags=tags,
 _id=_id)
 feed.entries.append(type('Entry', (feed,), entry_dict))
 return feed
# Example use.
feed_url = 'https://rickys-python-notes.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=1000'
feed = parse_feed(feed_url)
print(feed.title)
print(feed.published)
for entry in feed.entries:
 print(entry.title)
 print(entry.link)
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Dec 8, 2017 at 17:49
\$\endgroup\$
0

2 Answers 2

4
\$\begingroup\$

Nope, nope, nope.

feed = type('Feed', (object,), {})
feed.entries.append(type('Entry', (feed,), entry_dict))

The entire point of OOP is to have pre-defined classes, as contracts to follow. Since your classes are always the same, you should just define them with the class keyword. I recommend attrs to make it look nicely.

In a good design, classes are never created on the fly, out of thin air. They’re always defined in code, with a set of attributes that should also never change. (I’m not a fan of Python’s lenient style — Java, for example, makes it hard/impossible to create classes and new attributes at runtime.)

Or alternatively, you could make those regular lists of regular dicts. Not everything needs to be a class.

More complaints:

  • Entry should not inherit from Feed. They’re two separate, unrelated things.

  • dom = minidom.parseString(res.read().decode('latin-1'))
    

    99% of feeds in the wild are in UTF-8, and you should check the encoding in the <?xml ?> declaration.

answered Dec 8, 2017 at 18:49
\$\endgroup\$
1
\$\begingroup\$

As pointed out in the previous answer, creating classes on the fly is against the OOP philosophy.

Another problem is with parse_feed(): it does several things at time. This is against the SRP principle. A function is supposed to achieve one goal, and only that one. This facilitates code reuse and unit testing.

I would suggest creating a class which has 3 functions to implement the 3 main tasks I see parse_feed() is doing.

Ricky Wilson
1,7052 gold badges14 silver badges22 bronze badges
answered May 9, 2018 at 17:15
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.