I made a little test case to compare YAML and JSON speed :
import json
import yaml
from datetime import datetime
from random import randint
NB_ROW=1024
pri
You've probably noticed that Python's syntax for data structures is very similar to JSON's syntax.
What's happening is Python's json
library encodes Python's builtin datatypes directly into text chunks, replacing '
into "
and deleting ,
here and there (to oversimplify a bit).
On the other hand, pyyaml
has to construct a whole representation graph before serialising it into a string.
The same kind of stuff has to happen backwards when loading.
The only way to speedup yaml.load()
would be to write a new Loader
, but I doubt it could be a huge leap in performance, except if you're willing to write your own single-purpose sort-of YAML
parser, taking the following comment in consideration:
YAML builds a graph because it is a general-purpose serialisation format that is able to represent multiple references to the same object. If you know no object is repeated and only basic types appear, you can use a json serialiser, it will still be valid YAML.
-- UPDATE
What I said before remains true, but if you're running Linux
there's a way to speed up Yaml
parsing. By default, Python's yaml
uses the Python parser. You have to tell it that you want to use PyYaml
C
parser.
You can do it this way:
import yaml
from yaml import CLoader as Loader, CDumper as Dumper
dump = yaml.dump(dummy_data, fh, encoding='utf-8', default_flow_style=False, Dumper=Dumper)
data = yaml.load(fh, Loader=Loader)
In order to do so, you need yaml-cpp-dev
(package later renamed to libyaml-cpp-dev
) installed, for instance with apt-get:
$ apt-get install yaml-cpp-dev
And PyYaml
with LibYaml
as well. But that's already the case based on your output.
I can't test it right now because I'm running OS X and brew
has some trouble installing yaml-cpp-dev
but if you follow PyYaml documentation, they are pretty clear that performance will be much better.