I've been reading the PyYAML source code to try to understand how to define a proper constructor function that I can add with add_constructor
. I have a pretty good understanding of how that code works now, but I still don't understand why the default YAML constructors in the SafeConstructor
are generators. For example, the method construct_yaml_map
of SafeConstructor
:
def construct_yaml_map(self, node):
data = {}
yield data
value = self.construct_mapping(node)
data.update(value)
I understand how the generator is used in BaseConstructor.construct_object
as follows to stub out an object and only populate it with data from the node if deep=False
is passed to construct_mapping
:
if isinstance(data, types.GeneratorType):
generator = data
data = generator.next()
if self.deep_construct:
for dummy in generator:
pass
else:
self.state_generators.append(generator)
And I understand how the data is generated in BaseConstructor.construct_document
in the case where deep=False
for construct_mapping
.
def construct_document(self, node):
data = self.construct_object(node)
while self.state_generators:
state_generators = self.state_generators
self.state_generators = []
for generator in state_generators:
for dummy in generator:
pass
What I don't understand is the benefit of stubbing out the data objects and working down through the objects by iterating over the generators in construct_document
. Does this have to be done to support something in the YAML spec, or does it provide a performance benefit?
This answer on another question was somewhat helpful, but I don't understand why that answer does this:
def foo_constructor(loader, node):
instance = Foo.__new__(Foo)
yield instance
state = loader.construct_mapping(node, deep=True)
instance.__init__(**state)
instead of this:
def foo_constructor(loader, node):
state = loader.construct_mapping(node, deep=True)
return Foo(**state)
I've tested that the latter form works for the examples posted on that other answer, but perhaps I am missing some edge case.
I am using version 3.10 of PyYAML, but it looks like the code in question is the same in the latest version (3.12) of PyYAML.
In YAML you can have anchors and aliases. With that you can make self-referential structures, directly or indirectly.
If YAML would not have this possibility of self-reference, you could just first construct all the children and then create the parent structure in one go. But because of the self-references you might not have the child yet to "fill-out" the structure that you are creating. By using the two-step process of the generator (I call this two step, because it has only one yield before you come to the end of the method), you can create an object partially and the fill it out with a self-reference, because the object exist (i.e. its place in memory is defined).
The benefit is not in speed, but purely because of making the self-reference possible.
If you simplify the example from the answer you refer to a bit, the following loads:
import sys
import ruamel.yaml as yaml
class Foo(object):
def __init__(self, s, l=None, d=None):
self.s = s
self.l1, self.l2 = l
self.d = d
def foo_constructor(loader, node):
instance = Foo.__new__(Foo)
yield instance
state = loader.construct_mapping(node, deep=True)
instance.__init__(**state)
yaml.add_constructor(u'!Foo', foo_constructor)
x = yaml.load('''
&fooref
!Foo
s: *fooref
l: [1, 2]
d: {try: this}
''', Loader=yaml.Loader)
yaml.dump(x, sys.stdout)
but if you change foo_constructor()
to:
def foo_constructor(loader, node):
instance = Foo.__new__(Foo)
state = loader.construct_mapping(node, deep=True)
instance.__init__(**state)
return instance
(yield removed, added a final return), you get a ConstructorError
: with as message
found unconstructable recursive node
in "<unicode string>", line 2, column 1:
&fooref
PyYAML should give a similar message. Inspect the traceback on that error and you can see where ruamel.yaml/PyYAML tries to resolve the alias in the source code.
来源:https://stackoverflow.com/questions/41900782/why-does-pyyaml-use-generators-to-construct-objects