Parse atom RSS feed with xml.dom.minidom

Question 1

I had to parse a blogger RSS feed but I didn't have access to any third party modules like feedparser or lxml. I was stuck with the task of writing a library to parse RSS feeds; challenge accepted. I started by writing an RSS class and then an Entry class. I then realized that my classes only had two methods, one of them being __init__, so I scrapped the OOP approach and went for a more direct approach. I reduced everything down to one function parse_feed. parse_feed takes one positional argument: the URL to the RSS feed.

I'm curious what you think about the way I used type to create classes on the fly.

#-*-coding:utf8;-*-
#qpy:3
#qpy:console
import urllib.request
from xml.dom import minidom
def parse_feed(url):
 # This is what parse_feed returns.
 feed = type('Feed', (object,), {})
 feed.entries = []
 with urllib.request.urlopen(url) as res:
 dom = minidom.parseString(res.read().decode('latin-1'))
 feed.title = dom.getElementsByTagName('title')[0].firstChild.nodeValue
 feed.link = dom.getElementsByTagName('link')[0].getAttribute('href')
 feed.published = dom.getElementsByTagName('published')[0].firstChild.nodeValue
 for element in dom.getElementsByTagName('entry'):
 title = element.getElementsByTagName('title')[0].firstChild.nodeValue
 link = element.getElementsByTagName('link')[0].getAttribute('href')
 author = element.getElementsByTagName('name')[0].firstChild.nodeValue
 published = element.getElementsByTagName('published')[0].firstChild.nodeValue
 updated = element.getElementsByTagName('updated')[0].firstChild.nodeValue
 _id = element.getElementsByTagName('id')[0].firstChild.nodeValue
 category = element.getElementsByTagName('category')
 tags = []
 for node in category:
 tags.append(node.getAttribute('term'))
 article = element.getElementsByTagName('content')[0].firstChild.nodeValue
 entry_dict = dict(
 title=title, 
 link=link, 
 author=author, 
 article=article,
 tags=tags,
 _id=_id)
 feed.entries.append(type('Entry', (feed,), entry_dict))
 return feed
# Example use.
feed_url = 'https://rickys-python-notes.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=1000'
feed = parse_feed(feed_url)
print(feed.title)
print(feed.published)
for entry in feed.entries:
 print(entry.title)
 print(entry.link)

Question 2

Nope, nope, nope.

feed = type('Feed', (object,), {})
feed.entries.append(type('Entry', (feed,), entry_dict))

The entire point of OOP is to have pre-defined classes, as contracts to follow. Since your classes are always the same, you should just define them with the class keyword. I recommend attrs to make it look nicely.

In a good design, classes are never created on the fly, out of thin air. They’re always defined in code, with a set of attributes that should also never change. (I’m not a fan of Python’s lenient style — Java, for example, makes it hard/impossible to create classes and new attributes at runtime.)

Or alternatively, you could make those regular lists of regular dicts. Not everything needs to be a class.

More complaints:

Entry should not inherit from Feed. They’re two separate, unrelated things.
```
dom = minidom.parseString(res.read().decode('latin-1'))
```
99% of feeds in the wild are in UTF-8, and you should check the encoding in the <?xml ?> declaration.

Question 3

As pointed out in the previous answer, creating classes on the fly is against the OOP philosophy.

Another problem is with parse_feed(): it does several things at time. This is against the SRP principle. A function is supposed to achieve one goal, and only that one. This facilitates code reuse and unit testing.

I would suggest creating a class which has 3 functions to implement the 3 main tasks I see parse_feed() is doing.

Chris Warrick Chris Warrick 1811 silver badge3 bronze badges · Answer 1 · 2017-12-08 18:49:33Z

Nope, nope, nope.

feed = type('Feed', (object,), {})
feed.entries.append(type('Entry', (feed,), entry_dict))

The entire point of OOP is to have pre-defined classes, as contracts to follow. Since your classes are always the same, you should just define them with the class keyword. I recommend attrs to make it look nicely.

In a good design, classes are never created on the fly, out of thin air. They’re always defined in code, with a set of attributes that should also never change. (I’m not a fan of Python’s lenient style — Java, for example, makes it hard/impossible to create classes and new attributes at runtime.)

Or alternatively, you could make those regular lists of regular dicts. Not everything needs to be a class.

More complaints:

Entry should not inherit from Feed. They’re two separate, unrelated things.
```
dom = minidom.parseString(res.read().decode('latin-1'))
```
99% of feeds in the wild are in UTF-8, and you should check the encoding in the <?xml ?> declaration.

Billal BEGUERADJ Billal BEGUERADJ 1 · Answer 2 · 2018-05-09 17:15:43Z

As pointed out in the previous answer, creating classes on the fly is against the OOP philosophy.

Another problem is with parse_feed(): it does several things at time. This is against the SRP principle. A function is supposed to achieve one goal, and only that one. This facilitates code reuse and unit testing.

I would suggest creating a class which has 3 functions to implement the 3 main tasks I see parse_feed() is doing.

Stack Exchange Network

Parse atom RSS feed with xml.dom.minidom

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parse atom RSS feed with xml.dom.minidom

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions