Why is PyYAML spending so much time in just parsing a YAML File?

南楼画角 提交于 2019-12-21 04:13:15

问题


I am parsing a YAML file with around 6500 lines with this format:

foo1:
  bar1:
    blah: { name: "john", age: 123 }
  metadata: { whatever1: "whatever", whatever2: "whatever" }
  stuff:
    thing1: 
      bluh1: { name: "Doe1", age: 123 }
      bluh2: { name: "Doe2", age: 123 }
    thing2:
    ...
    thingN:
foo2:
...
fooN:

I just want to parse it with the PyYAML library (I think there is no more alternatives to it in Python: How can I parse a YAML file in Python).

Just for testing, I write that code to parse my file:

import yaml

config_file = "/path/to/file.yaml"

stream = open(config_file, "r")
sensors = yaml.load(stream)

Executing the script with time command along with the script I get this time:

real    0m3.906s
user    0m3.672s
sys     0m0.100s

That values doesn't seem too good really. I just want to test the same with JSON, just converting the same YAML file to JSON first:

import json

config_file = "/path/to/file.json"

stream = open(config_file, "r")
sensors = json.load(stream)  # We read the yaml config file

But the execution time is far better:

real    0m0.058s
user    0m0.032s
sys     0m0.008s

Why is the main reason that PyYAML spends more time parsing the YAML file than parsing the JSON one? Is it a problem of PyYAML or is it because of the YAML format is hard to parse? (probably is the first one)

EDIT:

I add another example with ruby and YAML:

require 'yaml'

sensors = YAML.load_file('/path/to/file.yaml')

And the execution time is good! (or at least not as bad as the PyYAML example):

real    0m0.278s
user    0m0.240s
sys     0m0.032s

回答1:


According to the docs you must use CLoader/CSafeLoader (and CDumper):

import yaml
try:
    from yaml import CLoader as Loader
except ImportError:
    from yaml import Loader

config_file = "test.yaml"

stream = open(config_file, "r")
sensors = yaml.load(stream, Loader=Loader)

This gives me

real    0m0.503s

instead of

real    0m2.714s


来源:https://stackoverflow.com/questions/18404441/why-is-pyyaml-spending-so-much-time-in-just-parsing-a-yaml-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!