Using PyYAML, if I read in a file with blank values in a dict:
test_str = \'\'\'
attrs:
first:
second: value2
\'\'\'
This returns Non
You get null
because dump()
uses the Representer()
which subclasses SafeRepresenter()
and to represent None
, the following method is called:
def represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null',
u'null')
As the string null
is hardcoded, there is no option to dump()
to change that.
The proper way to solve this in PyYAML is to make your own Dumper
subclass which has the Emitter
, Serializer
, and Resolver
from the standard Dumper
that dump()
uses, but with subclass of Representer
that represents None
the way you want it:
import sys
import yaml
from yaml.representer import Representer
from yaml.dumper import Dumper
from yaml.emitter import Emitter
from yaml.serializer import Serializer
from yaml.resolver import Resolver
yaml_str = """\
attrs:
first:
second: value2
"""
class MyRepresenter(Representer):
def represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null',
u'')
class MyDumper(Emitter, Serializer, MyRepresenter, Resolver):
def __init__(self, stream,
default_style=None, default_flow_style=None,
canonical=None, indent=None, width=None,
allow_unicode=None, line_break=None,
encoding=None, explicit_start=None, explicit_end=None,
version=None, tags=None):
Emitter.__init__(self, stream, canonical=canonical,
indent=indent, width=width,
allow_unicode=allow_unicode, line_break=line_break)
Serializer.__init__(self, encoding=encoding,
explicit_start=explicit_start, explicit_end=explicit_end,
version=version, tags=tags)
MyRepresenter.__init__(self, default_style=default_style,
default_flow_style=default_flow_style)
Resolver.__init__(self)
MyRepresenter.add_representer(type(None),
MyRepresenter.represent_none)
data = yaml.load(yaml_str)
yaml.dump(data, stream=sys.stdout, Dumper=MyDumper, default_flow_style=False)
gives you:
attrs:
first:
second: value2
If that sounds like a lot of overhead just to get rid of null
, it is. There are some shortcuts you can take and you can even try to graft the alternate function onto the existing Representer
, but since the actual function taken is referenced in a lookup table ( populated by add_representer
) you need to handle at least that reference as well.
The far more easy solution is replace PyYAML with ruamel.yaml and use its round_trip functionality (disclaimer: I am the author of that package):
import ruamel.yaml
yaml_str = """\
# trying to round-trip preserve empty scalar
attrs:
first:
second: value2
"""
data = ruamel.yaml.round_trip_load(yaml_str)
assert ruamel.yaml.round_trip_dump(data) == yaml_str
apart from emitting None
as the empty scalar, it also preserves order in mapping keys, comments and tag names, none of which PyYAML does. ruamel.yaml
also follows the YAML 1.2 specification (from 2009), where PyYAML uses the older YAML 1.1.
The ruamel.yaml
package can be installed with pip
from PyPI, or with modern Debian based distributions, also with apt-get python-ruamel.yaml