YAML.dump, 1600% faster

Luz (the music visualization studio) is written in Ruby and uses YAML as its project save format.  This made implementation a snap: literally just a few lines of code got us both project save and load.  (I ♥ Ruby.)

The problem was that saving was slooow, taking over 3 seconds to save a small project!

So, back at the first Luz Code Sprint, Markus Roberts and Jesse Hallett began investigating the problem, and eventually decided to re-implement it, using a different algorithm.

The resulting code, currently called ZAML, already works well enough to save Luz projects, and beats the pants off YAML speed-wise:

YAML: =================


Wow!  Nice graph! Saving now takes around 0.1 seconds.  Faaast.

Code is available here: http://github.com/hallettj/zaml/tree/master

IMPORTANT: ZAML does not yet implement every feature of YAML, triggers or some such.  Honestly I don’t even know what’s missing because, as I said above, I’ve only written a few lines of YAML-using code.


4 Responses to YAML.dump, 1600% faster

  1. I think it’s worth noting that all things are relative, and it’s worth benchmarking before switching libraries. For simple dumps, it appears that ZAML is 200% _slower_:

    require 'yaml'
    require 'zaml'
    require 'benchmark'

    Benchmark.bm do |x|
    x.report('yaml') { 10_000.times { YAML.dump([], IO.new(1)) } }
    x.report('zaml') { 10_000.times { ZAML.dump([], IO.new(1)) } }

    # user system total
    # yaml 0.610000 0.250000 0.860000
    # zaml 0.940000 0.460000 1.400000

    Yes, please do benchmark. If you find any real use-case where the ZAML algorithm is slower I’m sure Marcus will be interested.

  2. Please note that after further benchmarking, StringIO will provide a noticeable boost of speed to ZAML.

  3. Ryan Davis says:

    I’m late to the game here, but I need to weigh in on a common fallacy:

    Stephen… I think that is a classic example of bad benchmarking. How’d you pick 10k? I highly suspect that number is bad/low and if you increase it by an order of magnitude over a few iterations you’ll find that your numbers don’t pan out.

    Further, I suspect that what you’re really measuring is the overhead cost of dispatching to a ruby method vs dispatching to a C method. In my own benchmark of ruby vs c class methods that just return true (Qtrue in C) at 10k iterations you’ll see a vast disparity at 10k iterations: 0.0107s vs 0.0046s or 2.33x slower, but at 10m iterations that gap has narrowed: 9.464s vs 7.287s or 1.30x slower.

  4. Mark James says:

    On my very large data set ZAML-0.1.2 dumped 100 times faster than YAML, and used a quarter of the memory. Good stuff!

    But if you can sacrifice portability, Marshall still blows the text formats away in terms of both speed and (particularly) memory usage.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: