<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>dekstop weblog : Parsing an OPML Document Recursively With Ruby While Preserving Its Structure</title>
    <link>http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/</link>
    <description>I just started to write an aggregator in Ruby which will form the basis of a number of web applications, and a couple of minutes into the project I&apos;m already excited about the expressiveness of Ruby and its standard library. So much so that I had to share the results ...</description>
    <dc:language>en-us</dc:language>
    <dc:rights>Copyright 2006 Martin Dittus</dc:rights>
    <lastBuildDate>Tue, 14 Feb 2006 12:50:21 GMT</lastBuildDate>
    <generator>MicroLinks 5.6 (dekstop.de)</generator>
    <managingEditor>public&#64;dekstop&#46;de</managingEditor>
    <webMaster>public&#64;dekstop&#46;de</webMaster>

    <item>
      <title>Comment on "Parsing an OPML Document Recursively With Ruby While Preserving Its Structure"</title>
      <link>http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/#296</link>
      <description><![CDATA[<p>>I still have the problem that when the same feed is in two categories it gets overwritten in the hash. Any ideas how to fix that?</p>

<p>Ahh, that's true. And annoying. It didn't occur to me because my current feed reader does not allow to put a feed in multiple folders, so I didn't even think about testing this.</p>

<p>I don't think there's an obvious quick fix without changing the overall logic, but if anyone has suggestions let us know.</p>]]> &lt;p&gt;- <![CDATA[<a href="http://dekstop.de/" rel="nofollow">Martin Dittus</a>]]>&lt;/p&gt;</description>
      <dc:creator>Martin Dittus</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/#296</guid>
      <pubDate>Fri, 07 Jul 2006 16:28:03 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "Parsing an OPML Document Recursively With Ruby While Preserving Its Structure"</title>
      <link>http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/#266</link>
      <description><![CDATA[<p>sorry - I guess it does work - I was getting confused by the way it display.</p>

<p>I still have the problem that when the same feed is in two categories it gets overwritten in the hash.  Any ideas how to fix that?</p>]]> &lt;p&gt;- <![CDATA[<a href="http://digitalpodcast.com" rel="nofollow">Alex</a>]]>&lt;/p&gt;</description>
      <dc:creator>Alex</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/#266</guid>
      <pubDate>Thu, 29 Jun 2006 06:39:29 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "Parsing an OPML Document Recursively With Ruby While Preserving Its Structure"</title>
      <link>http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/#265</link>
      <description><![CDATA[<p>I took a look at this to try and make it work for the opensource podcast directory project (openpodcastdirectory.org).  it looks nice but it breaks if you put more than one link into a category.  So if blogs has two links you get something like this:</p>

<p>http://example5.com/feed: <br />
  - blogs2<br />
http://example3.com/feed: &id001 <br />
  - blogs<br />
  - dev<br />
http://example2.com/feed: <br />
  - blogs<br />
http://example4.com/feed: *id001</p>

<p>Any idea about how to fix the problem.  I'm trying to make something like this work so I can create the right parent and child categories given an opml list.</p>

<p>Here's a test file http://www.digitalpodcast.com/opml/test1.opml</p>

<p>If you have any idea please let me know.</p>

<p>Thanks</p>

<p>Alex<br />
DigitalPodcast.com</p>]]> &lt;p&gt;- <![CDATA[<a href="http://www.digitalpodcast.com" rel="nofollow">Alex Nesbitt</a>]]>&lt;/p&gt;</description>
      <dc:creator>Alex Nesbitt</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/#265</guid>
      <pubDate>Thu, 29 Jun 2006 06:28:55 GMT</pubDate>
    </item>


    <item>
      <title>Parsing an OPML Document Recursively With Ruby While Preserving Its Structure</title>
      <link>http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/</link> 
      <description><![CDATA[<p>I just started to write an aggregator in Ruby which will form the basis of a number of web applications, and a couple of minutes into the project I'm already excited about the expressiveness of Ruby and its standard library. So much so that I had to share the results of my first five minutes of coding.</p>

<p>I decided that the aggregator I'm writing will take its feed URLs from an OPML document. </p>

<p>A nice property of OPML is that it allows you to group feeds into a hierarchy of named elements, so that you can e.g. group some feeds in a "blogs" category, some other feeds in a "news" category, and so on. You could even have subgroups, so that e.g. your "news" category has subcategories "politics", "weather", "tech", etc.</p>

<p>So I thought a bit about how you can parse the OPML in a way that extracts feed URLs and still preserves this notion of hierarchical "categories" -- and it turns out it's remarkably simple, and it did indeed only take a couple of minutes to implement, most of which was spent reading up on API calls.</p>

<p>Here's the complete function:</p>
<pre>
# parse_opml (opml_node, parent_names=[])
#
# takes an REXML::Element that has OPML outline nodes as children, 
# parses its subtree recursively and returns a hash:
# { feed_url =&gt; [parent_name_1, parent_name_2, ...] } 
#
def parse_opml(opml_node, parent_names=[])
  feeds = {}
  opml_node.elements.each('outline') do |el|
    if (el.elements.size != 0) 
      feeds.merge!(parse_opml(el, parent_names + [el.attributes['text']]))
    end
    if (el.attributes['xmlUrl'])
      feeds[el.attributes['xmlUrl']] = parent_names
    end
  end
  return feeds
end
</pre>

<p>And here's how you call it:</p>
<pre>
require 'rexml/Document'

opml = REXML::Document.new(File.read('my_feeds.opml'))
feeds = parse_opml(opml.elements['opml/body'])
</pre>    

<p>To make it clear what I'm trying to do I'll show you a simple example. If this is the content of <tt>my_feeds.opml</tt>:</p>
<pre>
&lt;opml version="1.1"&gt;
&lt;body&gt;
  &lt;outline xmlUrl="http://example1.com/feed" /&gt;
  &lt;outline text="blogs"&gt;
    &lt;outline xmlUrl="http://example2.com/feed" /&gt;
    &lt;outline text="dev"&gt;
      &lt;outline xmlUrl="http://example3.com/feed" /&gt;
    &lt;/outline&gt;
  &lt;/outline&gt;
&lt;/body&gt;
&lt;/opml&gt;
</pre>

<p>...then the hash returned from <tt>parse_opml</tt> will look like this:</p>
<pre>
{
  "http://example1.com/feed" =&gt; [],
  "http://example2.com/feed" =&gt; ["blogs"],
  "http://example3.com/feed" =&gt; ["blogs", "dev"]
}
</pre>

<p>And I'm still amazed. Eat this, PHP ;)</p>

<h3>Related Articles</h3>
<ul class="links">
  <li><a href="http://dekstop.de/weblog/2005/12/22c3_opml_feed/">An OPML Feed of 22C3 Blogs</a></li>
  <li><a href="http://dekstop.de/weblog/2006/03/feed_readers_a_commodity/">Feed Readers Are a Commodity -- If Not Now, then Soon.</a></li>
  <li><a href="http://dekstop.de/weblog/2006/01/revisiting_aggregators_pt_one/">Revisiting Aggregators Part I: User-Designed Interfaces</a></li>
  <li><a href="http://dekstop.de/weblog/2005/12/feedtools_cache_in_ruby_scripts/">Using the FeedTools Cache in Plain Ruby Scripts</a></li>
</ul>
]]></description>
      <dc:creator>Martin Dittus</dc:creator>
      <category>code</category>
      <category>tools</category>
      
      <guid isPermaLink="true">http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/</guid>
      <pubDate>Tue, 14 Feb 2006 12:50:21 GMT</pubDate>
    </item>
  </channel>
</rss>
