Can I dump blank instead of null in yaml/pyyaml?

前端 未结 3 988
灰色年华
灰色年华 2021-02-08 21:57

Using PyYAML, if I read in a file with blank values in a dict:

test_str = \'\'\'
attrs:
  first:
  second: value2
\'\'\'

This returns Non

相关标签:
3条回答
  • 2021-02-08 22:46

    You get null because dump() uses the Representer() which subclasses SafeRepresenter() and to represent None, the following method is called:

    def represent_none(self, data):
        return self.represent_scalar(u'tag:yaml.org,2002:null',
                                     u'null')
    

    As the string null is hardcoded, there is no option to dump() to change that.

    The proper way to solve this in PyYAML is to make your own Dumper subclass which has the Emitter, Serializer, and Resolver from the standard Dumper that dump() uses, but with subclass of Representer that represents None the way you want it:

    import sys
    import yaml
    
    from yaml.representer import Representer
    from yaml.dumper import Dumper
    from yaml.emitter import Emitter
    from yaml.serializer import Serializer
    from yaml.resolver import Resolver
    
    
    yaml_str = """\
    attrs:
      first:
      second: value2
    """
    
    class MyRepresenter(Representer):
        def represent_none(self, data):
            return self.represent_scalar(u'tag:yaml.org,2002:null',
                                     u'')
    
    class MyDumper(Emitter, Serializer, MyRepresenter, Resolver):
        def __init__(self, stream,
                default_style=None, default_flow_style=None,
                canonical=None, indent=None, width=None,
                allow_unicode=None, line_break=None,
                encoding=None, explicit_start=None, explicit_end=None,
                version=None, tags=None):
            Emitter.__init__(self, stream, canonical=canonical,
                    indent=indent, width=width,
                    allow_unicode=allow_unicode, line_break=line_break)
            Serializer.__init__(self, encoding=encoding,
                    explicit_start=explicit_start, explicit_end=explicit_end,
                    version=version, tags=tags)
            MyRepresenter.__init__(self, default_style=default_style,
                    default_flow_style=default_flow_style)
            Resolver.__init__(self)
    
    MyRepresenter.add_representer(type(None),
                                  MyRepresenter.represent_none)
    
    data = yaml.load(yaml_str)
    yaml.dump(data, stream=sys.stdout, Dumper=MyDumper, default_flow_style=False)
    

    gives you:

    attrs:
      first:
      second: value2
    

    If that sounds like a lot of overhead just to get rid of null, it is. There are some shortcuts you can take and you can even try to graft the alternate function onto the existing Representer, but since the actual function taken is referenced in a lookup table ( populated by add_representer ) you need to handle at least that reference as well.

    The far more easy solution is replace PyYAML with ruamel.yaml and use its round_trip functionality (disclaimer: I am the author of that package):

    import ruamel.yaml
    
    yaml_str = """\
    # trying to round-trip preserve empty scalar
    attrs:
      first:
      second: value2
    """
    
    data = ruamel.yaml.round_trip_load(yaml_str)
    assert ruamel.yaml.round_trip_dump(data) == yaml_str
    

    apart from emitting None as the empty scalar, it also preserves order in mapping keys, comments and tag names, none of which PyYAML does. ruamel.yaml also follows the YAML 1.2 specification (from 2009), where PyYAML uses the older YAML 1.1.


    The ruamel.yaml package can be installed with pip from PyPI, or with modern Debian based distributions, also with apt-get python-ruamel.yaml

    0 讨论(0)
  • 2021-02-08 22:51

    Based on @Anthon's excellent answer, I was able to craft this solution:

    def represent_none(self, _):
        return self.represent_scalar('tag:yaml.org,2002:null', '')
    
    yaml.add_representer(type(None), represent_none)
    

    Based on my understanding of the PyYAML code, adding a representer for an existing type should simply replace the existing representer.

    This is a global change and that means that all following dumps use a blank. If some unrelated other piece of code in your program relies on None to be represented in the "normal" way, e.g. a library that you import and that uses PyYAML as well, that library will no longer work in the exepected way/correctly, in that case subclassing is the correct way to go.

    0 讨论(0)
  • 2021-02-08 22:58

    just use string replace

    print(yaml.dump(data).replace("null", ""))
    
    0 讨论(0)
提交回复
热议问题