YARV makes REXML > 200% faster

Martin Dittus · 2006-03-27 · code · write a comment

I finally found a good test case to compare YARV's execution speed with the plain ruby interpreter -- and the results are quite satisfying.

The script I benchmarked reads and parses 3.300 small XML files, extracts data, and writes the result to a tab separated file.

(You can find a description of how to install YARV in the comments of why's recent article "YARV Merged Matz".)

About the Test Case

The benchmarked script, export.rb, is a real-life script that I wrote a couple of days ago: A friend is working on a profile matching algorithm for a community site, and to test his approach he needed some real-world data. I'm jumping at any chance to write a screen scraper, so I proposed to help. I extracted 3.300 XML files from Last.FM user profiles using the Audioscrobbler API, and proceeded to extract data from them. Here is an example file from my own Last.FM user profile: topartists.xml.

The latter part of this process, extracting data from the XML, takes quite a while to process; REXML is convenient, but it's certainly not the fastest way to parse and query an XML file. But this is what makes this a meaningful benchmark to me: yeah it's quite limited in its scope, but it is a good example of the things I do in my daily life as a Ruby developer. Everything that succeeds at speeding up REXML makes me more productive.

What follows is the core of the benchmarked code, cleaned up for clarity. It loads all XML files in a directory, parses their content, extracts some data using an XPath expression and writes the result to an IO stream.

Dir.glob("#{CACHE_DIR}*.xml").each do |cachefile|
  name = File.basename(cachefile, ".xml")
  items = []
  xmldata = File.read(cachefile)
  REXML::Document.new(xmldata).elements.each('topartists/artist/name') do |el|
    items << el.text
  end
  out << items.unshift(name).join("\t")
end

The benchmark results

This is not a very scientific setup, it's just a quick test to satisfy my own curiosity -- e.g. note that the Ruby versions don't match. So simply take it as an illustration of the potential speedup YARV can provide, and not much else.

Using Ruby's default interpreter:

$ ruby -v
ruby 1.8.4 (2005-12-24) [powerpc-darwin8.5.0]

$ time ruby export.rb
real    13m56.559s
user    10m42.359s
sys     0m15.955s

$ time ruby export.rb
real    13m54.813s
user    10m58.716s
sys     0m14.627s

$ time ruby export.rb
real    14m0.648s
user    10m55.555s
sys     0m14.307s

And when using YARV:

$ ruby_yarv -v        
ruby 2.0.0 (Base: Ruby 1.9.0 2006-02-14) [powerpc-darwin8.5.0]
YARVCore 0.4.0 Rev: 475 (2006-02-23) [opts: ]

$ time ruby_yarv export.rb
real    6m0.519s
user    4m36.774s
sys     0m8.683s

$ time ruby_yarv export.rb
real    6m0.070s
user    4m32.151s
sys     0m7.609s

$ time ruby_yarv export.rb
real    6m1.309s
user    4m32.323s
sys     0m7.487s

So it's not as impressive as in other tests, but it's a great start.


Next article:

Previous article:

Recent articles:

Comments

Comments are closed. You can contact me instead.