Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

前端 未结 3 1944
谎友^
谎友^ 2020-12-01 05:52

I\'d like to be able to dump a dictionary containing long strings that I\'d like to have in the block style for readability. For example:

foo: |
  this is a
         


        
相关标签:
3条回答
  • 2020-12-01 06:12

    pyyaml does support dumping literal or folded blocks.

    Using Representer.add_representer

    defining types:

    class folded_str(str): pass
    
    class literal_str(str): pass
    
    class folded_unicode(unicode): pass
    
    class literal_unicode(str): pass
    

    Then you can define the representers for those types. Please note that while Gary's solution works great for unicode, you may need some more work to get strings to work right (see implementation of represent_str).

    def change_style(style, representer):
        def new_representer(dumper, data):
            scalar = representer(dumper, data)
            scalar.style = style
            return scalar
        return new_representer
    
    import yaml
    from yaml.representer import SafeRepresenter
    
    # represent_str does handle some corner cases, so use that
    # instead of calling represent_scalar directly
    represent_folded_str = change_style('>', SafeRepresenter.represent_str)
    represent_literal_str = change_style('|', SafeRepresenter.represent_str)
    represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
    represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)
    

    Then you can add those representers to the default dumper:

    yaml.add_representer(folded_str, represent_folded_str)
    yaml.add_representer(literal_str, represent_literal_str)
    yaml.add_representer(folded_unicode, represent_folded_unicode)
    yaml.add_representer(literal_unicode, represent_literal_unicode)
    

    ... and test it:

    data = {
        'foo': literal_str('this is a\nblock literal'),
        'bar': folded_unicode('this is a folded block'),
    }
    
    print yaml.dump(data)
    

    result:

    bar: >-
      this is a folded block
    foo: |-
      this is a
      block literal
    

    Using default_style

    If you are interested in having all your strings follow a default style, you can also use the default_style keyword argument, e.g:

    >>> data = { 'foo': 'line1\nline2\nline3' }
    >>> print yaml.dump(data, default_style='|')
    "foo": |-
      line1
      line2
      line3
    

    or for folded literals:

    >>> print yaml.dump(data, default_style='>')
    "foo": >-
      line1
    
      line2
    
      line3
    

    or for double-quoted literals:

    >>> print yaml.dump(data, default_style='"')
    "foo": "line1\nline2\nline3"
    

    Caveats:

    Here is an example of something you may not expect:

    data = {
        'foo': literal_str('this is a\nblock literal'),
        'bar': folded_unicode('this is a folded block'),
        'non-printable': literal_unicode('this has a \t tab in it'),
        'leading': literal_unicode('   with leading white spaces'),
        'trailing': literal_unicode('with trailing white spaces  '),
    }
    print yaml.dump(data)
    

    results in:

    bar: >-
      this is a folded block
    foo: |-
      this is a
      block literal
    leading: |2-
         with leading white spaces
    non-printable: "this has a \t tab in it"
    trailing: "with trailing white spaces  "
    

    1) non-printable characters

    See the YAML spec for escaped characters (Section 5.7):

    Note that escape sequences are only interpreted in double-quoted scalars. In all other scalar styles, the “\” character has no special meaning and non-printable characters are not available.

    If you want to preserve non-printable characters (e.g. TAB), you need to use double-quoted scalars. If you are able to dump a scalar with literal style, and there is a non-printable character (e.g. TAB) in there, your YAML dumper is non-compliant.

    E.g. pyyaml detects the non-printable character \t and uses the double-quoted style even though a default style is specified:

    >>> data = { 'foo': 'line1\nline2\n\tline3' }
    >>> print yaml.dump(data, default_style='"')
    "foo": "line1\nline2\n\tline3"
    
    >>> print yaml.dump(data, default_style='>')
    "foo": "line1\nline2\n\tline3"
    
    >>> print yaml.dump(data, default_style='|')
    "foo": "line1\nline2\n\tline3"
    

    2) leading and trailing white spaces

    Another bit of useful information in the spec is:

    All leading and trailing white space characters are excluded from the content

    This means that if your string does have leading or trailing white space, these would not be preserved in scalar styles other than double-quoted. As a consequence, pyyaml tries to detect what is in your scalar and may force the double-quoted style.

    0 讨论(0)
  • 2020-12-01 06:12

    This can be relatively easily done, the only "hurdle" being how to indicate which of the spaces in the string, that needs to be represented as a folded scalar, needs to become a fold. The literal scalar has explicit newlines containing that information, but this cannot be used for folded scalars, as they can contain explicit newlines e.g. in case there is leading whitespace and also needs a newline at the end in order not to be represented with a stripping chomping indicator (>-)

    import sys
    import ruamel.yaml
    
    folded = ruamel.yaml.scalarstring.FoldedScalarString
    literal = ruamel.yaml.scalarstring.LiteralScalarString
    
    yaml = ruamel.yaml.YAML()
    
    data = dict(
        foo=literal('this is a\nblock literal\n'), 
        bar=folded('this is a folded block\n'),
    )
    
    data['bar'].fold_pos = [data['bar'].index(' folded')]
    
    yaml.dump(data, sys.stdout)
    

    which gives:

    foo: |
      this is a
      block literal
    bar: >
      this is a
      folded block
    

    The fold_pos attribute expects a reversable iterable, representing positions of spaces indicating where to fold.

    If you never have pipe characters ('|') in your strings you could have done something like:

    import re
    
    s = 'this is a|folded block\n'
    sf = folded(s.replace('|', ' '))  # need to have a space!
    sf.fold_pos = [x.start() for x in re.finditer('\|', s)]  # | is special in re, needs escaping
    
    
    data = dict(
        foo=literal('this is a\nblock literal\n'), 
        bar=sf,  # need to have a space
    )
    
    yaml = ruamel.yaml.YAML()
    yaml.dump(data, sys.stdout)
    

    which also gives exactly the output you expect

    0 讨论(0)
  • 2020-12-01 06:19
    import yaml
    
    class folded_unicode(unicode): pass
    class literal_unicode(unicode): pass
    
    def folded_unicode_representer(dumper, data):
        return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
    def literal_unicode_representer(dumper, data):
        return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')
    
    yaml.add_representer(folded_unicode, folded_unicode_representer)
    yaml.add_representer(literal_unicode, literal_unicode_representer)
    
    data = {
        'literal':literal_unicode(
            u'by hjw              ___\n'
             '   __              /.-.\\\n'
             '  /  )_____________\\\\  Y\n'
             ' /_ /=== == === === =\\ _\\_\n'
             '( /)=== == === === == Y   \\\n'
             ' `-------------------(  o  )\n'
             '                      \\___/\n'),
        'folded': folded_unicode(
            u'It removes all ordinary curses from all equipped items. '
            'Heavy or permanent curses are unaffected.\n')}
    
    print yaml.dump(data)
    

    The result:

    folded: >
      It removes all ordinary curses from all equipped items. Heavy or permanent curses
      are unaffected.
    literal: |
      by hjw              ___
         __              /.-.\
        /  )_____________\\  Y
       /_ /=== == === === =\ _\_
      ( /)=== == === === == Y   \
       `-------------------(  o  )
                            \___/
    

    For completeness, one should also have str implementations, but I'm going to be lazy :-)

    0 讨论(0)
提交回复
热议问题