问题
I am parsing a YAML file with around 6500 lines with this format:
foo1:
bar1:
blah: { name: "john", age: 123 }
metadata: { whatever1: "whatever", whatever2: "whatever" }
stuff:
thing1:
bluh1: { name: "Doe1", age: 123 }
bluh2: { name: "Doe2", age: 123 }
thing2:
...
thingN:
foo2:
...
fooN:
I just want to parse it with the PyYAML library (I think there is no more alternatives to it in Python: How can I parse a YAML file in Python).
Just for testing, I write that code to parse my file:
import yaml
config_file = "/path/to/file.yaml"
stream = open(config_file, "r")
sensors = yaml.load(stream)
Executing the script with time
command along with the script I get this time:
real 0m3.906s
user 0m3.672s
sys 0m0.100s
That values doesn't seem too good really. I just want to test the same with JSON, just converting the same YAML file to JSON first:
import json
config_file = "/path/to/file.json"
stream = open(config_file, "r")
sensors = json.load(stream) # We read the yaml config file
But the execution time is far better:
real 0m0.058s
user 0m0.032s
sys 0m0.008s
Why is the main reason that PyYAML spends more time parsing the YAML file than parsing the JSON one? Is it a problem of PyYAML or is it because of the YAML format is hard to parse? (probably is the first one)
EDIT:
I add another example with ruby and YAML:
require 'yaml'
sensors = YAML.load_file('/path/to/file.yaml')
And the execution time is good! (or at least not as bad as the PyYAML example):
real 0m0.278s
user 0m0.240s
sys 0m0.032s
回答1:
According to the docs you must use CLoader
/CSafeLoader
(and CDumper
):
import yaml
try:
from yaml import CLoader as Loader
except ImportError:
from yaml import Loader
config_file = "test.yaml"
stream = open(config_file, "r")
sensors = yaml.load(stream, Loader=Loader)
This gives me
real 0m0.503s
instead of
real 0m2.714s
来源:https://stackoverflow.com/questions/18404441/why-is-pyyaml-spending-so-much-time-in-just-parsing-a-yaml-file