July 31, 2006

Sam's continuations based REXML parser based on Expat

Just came across Sam's REXML compatible XML parser based on Expat which had my brain thinking for a bit.

The interesting thing about this XML parser (other than it implements the REXML interfaces with an Expat implementation) is that it's pull based and uses continuation's to parse files in a really effective manner.

The source is here which is probably worth a glance before reading on.

The part that made everything click for me was understanding the first line of the initialize method, which sets everything up:

def initialize xml
    callcc { |@sax_context| return }
    ....snip....

Here, callcc creates a continuation, storing it in the instance local variable, @sax_content, and returns, leaving the parser ready to go.

Then, the caller of this parser invokes the pull method to get an XML token:

def pull
    callcc { |@pull_context| @sax_context.call }
end

which creates another continuation in the instance local @pull_context, saving the execution frame within pull, and calls upon the continuation stored previously in the @sax_content local to "continue" executing.

This takes execution back to the line after the 'callcc' in the initialize method, which opens the file and enters a loop to parse the XML, and reacts to finding a token by 'pushing' it back to the pull method by calling upon the saved @pull_context continuation with the token's value:

def push *value
    callcc { |@sax_context| @pull_context.call value }
end

Here push creates a new @sax_context continuation (essentially freezing the parse of the file until another pull is invoked, but also saving where the parsing was up to), and calls upon the previously saved @pull_context continuation to "continue" executing now that we have a token to return. In Ruby the return value of the last line of code in a method is the return value of the method so the pull method above "continues" and returns the token to the caller.

The parser has now returned the first token to the caller and is essentially waiting to be told to get the next one.

Once another invocation of pull is made, the @sax_content continuation continues (from where it was last created, which is currently inside the token parsing loop, after the first token in the XML), and the process of pushing the next read token back to the caller of pull starts again, all the way until the file has ended.

It's quite a neat way of demonstrating the use of continuations with so little code.

Some interesting background reading is Sam's 'Continuations for Curmugdeons' post which can make the concepts of continuations a bit easier to understand, or the Cocoon web application framework which uses continuations to implement flow between pages in web applications.

Posted by crafterm at 10:58 AM | Comments (0) | TrackBack

Creating a gem under Ruby

Over the weekend I created my first ruby gem for some internal code I've been prototyping at work. I was surprised how easy it is to create the gem, particularly via Rake, the Ruby build tool.

All you need to do is add a ruby gem specification to your Rakefile and a task definition and you're done:

spec = Gem::Specification.new do |s| 
  s.name = "Name"
  s.version = "0.0.1"
  s.author = "Marcus Crafter"
  s.email = "crafterm@gmail.com"
  s.homepage = "http://blogs.cocoondev.org/crafterm/"
  s.platform = Gem::Platform::RUBY
  s.summary = "Some description"
  s.files = FileList["{bin,lib}/**/*"].to_a
  s.require_path = "lib"
  s.autorequire = "name"
  s.test_files = FileList["{test}/**/*test.rb"].to_a
  s.has_rdoc = true
  s.extra_rdoc_files = ["README"]
  s.add_dependency("dependency", ">= 0.x.x")
end
 
Rake::GemPackageTask.new(spec) do |pkg| 
  pkg.need_tar = true 
end 

Then a:

$> rake gem

will build a package for you in a pkg subdirectory ready for deployment, installation, etc.

In addition to this there's nice support for building ruby extensions upon gem installation, and also non-ruby cod, upon other things.

The only unusual thing for a Java developer is that source for gems by convention go into the lib directory rather than src.

Useful links in this area are the Gemspec reference, this article on linuxjournal, and also chapter 17 from the Ruby Book which I found most valuable.

Posted by crafterm at 02:59 AM | Comments (0) | TrackBack