How to Read a RSS Feed Into a Dataset
August 3rd, 2007
RSS (Really Simple Syndication) feeds are the lifeblood of blogs and other frequently updated Internet content. At their core, they’re really just another XML file so reading one in VB.NET is quite easy when you take advantage of the built-in tools. In this example, we’ll be looking at how to download a RSS file and load it into a dataset.
First, let’s understand the structure of a RSS file. This Wikipedia article has the full layout for you to review. In our example, we will be looking primarily at the Item data in the section of the RSS file that looks like this:
.
.
.
<item>
<title>Article Title goes here</title>
<link>http://example.com/the-article-permalink-goes-here</link>
<description>
Article text goes here
</description>
</item>
The Item element, at the very least, contains the title of the article, a link to the article, and a description that might be the full article or just an excerpt. There are other elements and attributes that can appear, such as the author’s name and a guid value that further identifies the article, but we’ll be keeping it simple for this demo.
At first glance, the task of downloading a file from the Internet and parsing it and loading it into a dataset might seem a bit daunting, particularly if you haven’t worked a lot with XML files in VB.NET. However, the reliable structure of a valid RSS file and built-in .NET Framework data functions make this task a one liner (two if you count initializing the dataset). Here it is:
Dim RssData As New DataSet RssData.ReadXml(RssSourceLocation)
And that’s it. The ReadXml function of a dataset accepts a filename as input in one of its overloads. In this context a valid filename can also be a valid URL. For example, if we wanted to get the last 100 entries in the Google Blog Search on VB.NET, we would just send ReadXml a string like this:
"http://blogsearch.google.com/blogsearch_feeds?q=VB.NET&num=100&output=rss"
All you need is the address of the site’s feeds and you can load it into your dataset. But, how can you work with it once it’s there in the dataset?
When you load a XML document into a dataset using ReadXML without providing a schema, .NET uses inference to determine the structure to use. If it can’t figure out the structure, an exception is raised. This could be a problem for just any old XML file but the RSS structure is well known and has to be correct for it to work in a wide variety of readers. Each parent element type is placed in a datatable of the same name. For our RSS file, we’re primarily interested in the ‘item’ entries. So we would loop through the data like this:
For Each RssRow As DataRow In RssData.Tables("item").Rows
ArticleTitle = RssRow.Item("title").ToString
ArticleLink = RssRow.Item("link").ToString
ArticleText = RssRow.Item("description").ToString
Next
Of course, you could get additional fields or read the header information if you wanted to at this point. You could take the values and save them to your own database for later analysis. You could generate your own aggregate feed for either a desktop app or a web page. There really are a lot of possibilities. Of course, this method could be used for bad, for example, creating a splog or scrapped site, so use it wisely and carefully.
I would like to know what you think. Please feel free to leave a comment or question if you have one.
Entry Filed under: Code Examples
Rate This Article:










1 Comment Add your own
1. Greg | April 8th, 2009 at 5:44 am
Thank you, very helpful quick and tidy way or reading RSS feeds.
I should really start outputting pages like this as it is helpful to anybody that passes them and are looking for a solution to a problem.
It is users like you that help us all spending the time you do on creating information.
Thank you once again.
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed