Skip to content

All You Need For Parsing RSS Feeds

by Paul Kenjora on February 5th, 2008

I’m working on a a few applications to extend the Arkayne API and I needed a simple and easy to use RSS feed parser. It needed to tackle two issues:

  1. Parsing any feed (or as many as possible).
  2. Parsing any encoding (or as many as possible).

Granted one is solved by the parser while the other will probably fall to some clever code using Python’s built in encoding support.

The solutions I strongly recommend are:

  1. The Universal Feed Parser.
  2. The XML.com: Unicode Secrets.

The above two sources gave me all I needed to be able to parse the feeds I wanted. Biggest plus for me is that the Universal Feed Parser is a simple "feedparser.py" file. Donwloading is easy and no install is necessary, just add it to your PYTHONPATH. The resulting code will probably look something like this:


import feedparser

channels = feedparser.parse(feed.url)
url = ''
summary = ''
title = ''

for entry in channels.entries:
try:
url = unicode(entry.link, channels.encoding)
summary = unicode(entry.description, channels.encoding)
title = unicode(entry.title, channels.encoding)
except:
url = entry.link
summary = entry.description
title = entry.title

print "URL: ", url
print "Summary: ", summary
print "Title: ", title

Thats the simplest example. Of course there are many more fields that can be accessed. For more details on the Universal Feed Parser see the home page: http://www.feedparser.org

  • You might want to look at vtd-xml, the latest and most advanced xml processing model, far better than DOM or SAX


    vtd-xml
blog comments powered by Disqus