Luz (the music visualization studio) is written in Ruby and uses YAML as its project save format. This made implementation a snap: literally just a few lines of code got us both project save and load. (I ♥ Ruby.)
The problem was that saving was slooow, taking over 3 seconds to save a small project!
So, back at the first Luz Code Sprint, Markus Roberts and Jesse Hallett began investigating the problem, and eventually decided to re-implement it, using a different algorithm.
The resulting code, currently called ZAML, already works well enough to save Luz projects, and beats the pants off YAML speed-wise:
YAML: =================
ZAML: =
Wow! Nice graph! Saving now takes around 0.1 seconds. Faaast.
Code is available here: http://github.com/hallettj/zaml/tree/master
IMPORTANT: ZAML does not yet implement every feature of YAML, triggers or some such. Honestly I don’t even know what’s missing because, as I said above, I’ve only written a few lines of YAML-using code.
I think it’s worth noting that all things are relative, and it’s worth benchmarking before switching libraries. For simple dumps, it appears that ZAML is 200% _slower_:
require 'yaml'
require 'zaml'
require 'benchmark'
Benchmark.bm do |x|
x.report('yaml') { 10_000.times { YAML.dump([], IO.new(1)) } }
x.report('zaml') { 10_000.times { ZAML.dump([], IO.new(1)) } }
end
# user system total
# yaml 0.610000 0.250000 0.860000
# zaml 0.940000 0.460000 1.400000
Yes, please do benchmark. If you find any real use-case where the ZAML algorithm is slower I’m sure Marcus will be interested.
Please note that after further benchmarking, StringIO will provide a noticeable boost of speed to ZAML.
I’m late to the game here, but I need to weigh in on a common fallacy:
Stephen… I think that is a classic example of bad benchmarking. How’d you pick 10k? I highly suspect that number is bad/low and if you increase it by an order of magnitude over a few iterations you’ll find that your numbers don’t pan out.
Further, I suspect that what you’re really measuring is the overhead cost of dispatching to a ruby method vs dispatching to a C method. In my own benchmark of ruby vs c class methods that just return true (Qtrue in C) at 10k iterations you’ll see a vast disparity at 10k iterations: 0.0107s vs 0.0046s or 2.33x slower, but at 10m iterations that gap has narrowed: 9.464s vs 7.287s or 1.30x slower.
On my very large data set ZAML-0.1.2 dumped 100 times faster than YAML, and used a quarter of the memory. Good stuff!
But if you can sacrifice portability, Marshall still blows the text formats away in terms of both speed and (particularly) memory usage.