Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

前端未结

关注

 3  1944

I\'d like to be able to dump a dictionary containing long strings that I\'d like to have in the block style for readability. For example:

foo: |
  this is a


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  心在旅途        
                
              
                            
                2020-12-01 06:12
              
            
            
                                                                       
pyyaml does support dumping literal or folded blocks.

Using Representer.add_representer

defining types:

class folded_str(str): pass

class literal_str(str): pass

class folded_unicode(unicode): pass

class literal_unicode(str): pass


Then you can define the representers for those types.
Please note that while Gary's solution works great for unicode, you may need some more work to get strings to work right (see implementation of represent_str).

def change_style(style, representer):
    def new_representer(dumper, data):
        scalar = representer(dumper, data)
        scalar.style = style
        return scalar
    return new_representer

import yaml
from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_folded_str = change_style('>', SafeRepresenter.represent_str)
represent_literal_str = change_style('|', SafeRepresenter.represent_str)
represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)


Then you can add those representers to the default dumper:

yaml.add_representer(folded_str, represent_folded_str)
yaml.add_representer(literal_str, represent_literal_str)
yaml.add_representer(folded_unicode, represent_folded_unicode)
yaml.add_representer(literal_unicode, represent_literal_unicode)


... and test it:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
}

print yaml.dump(data)


result:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal


Using default_style

If you are interested in having all your strings follow a default style, you can also use the default_style keyword argument, e.g:

>>> data = { 'foo': 'line1\nline2\nline3' }
>>> print yaml.dump(data, default_style='|')
"foo": |-
  line1
  line2
  line3


or for folded literals:

>>> print yaml.dump(data, default_style='>')
"foo": >-
  line1

  line2

  line3


or for double-quoted literals:

>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\nline3"


Caveats:

Here is an example of something you may not expect:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
    'non-printable': literal_unicode('this has a \t tab in it'),
    'leading': literal_unicode('   with leading white spaces'),
    'trailing': literal_unicode('with trailing white spaces  '),
}
print yaml.dump(data)


results in:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
leading: |2-
     with leading white spaces
non-printable: "this has a \t tab in it"
trailing: "with trailing white spaces  "


1) non-printable characters

See the YAML spec for escaped characters (Section 5.7):


  Note that escape sequences are only interpreted in double-quoted scalars. In all other scalar styles, the “\” character has no special meaning and non-printable characters are not available.


If you want to preserve non-printable characters (e.g. TAB), you need to use double-quoted scalars. If you are able to dump a scalar with literal style, and there is a non-printable character (e.g. TAB) in there, your YAML dumper is non-compliant.

E.g. pyyaml detects the non-printable character \t and uses the double-quoted style even though a default style is specified:

>>> data = { 'foo': 'line1\nline2\n\tline3' }
>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='>')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='|')
"foo": "line1\nline2\n\tline3"


2) leading and trailing white spaces

Another bit of useful information in the spec is:


  All leading and trailing white space characters are excluded from the content


This means that if your string does have leading or trailing white space, these would not be preserved in scalar styles other than double-quoted. As a consequence, pyyaml tries to detect what is in your scalar and may force the double-quoted style.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2020-12-01 06:12
              
            
            
                                                                       
This can be relatively easily done, the only "hurdle" being how to
indicate which of the spaces in the string, that needs to be
represented as a folded scalar, needs to become a fold. The literal scalar
has explicit newlines containing that information, but this cannot
be used for folded scalars, as they can contain explicit newlines e.g. in 
case there is leading whitespace and also needs a newline at the end
in order not to be represented with a stripping chomping indicator (>-)

import sys
import ruamel.yaml

folded = ruamel.yaml.scalarstring.FoldedScalarString
literal = ruamel.yaml.scalarstring.LiteralScalarString

yaml = ruamel.yaml.YAML()

data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=folded('this is a folded block\n'),
)

data['bar'].fold_pos = [data['bar'].index(' folded')]

yaml.dump(data, sys.stdout)


which gives:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block


The fold_pos attribute expects a reversable iterable, representing positions
of spaces indicating where to fold. 

If you never have pipe characters ('|') in your strings you
could have done something like:

import re

s = 'this is a|folded block\n'
sf = folded(s.replace('|', ' '))  # need to have a space!
sf.fold_pos = [x.start() for x in re.finditer('\|', s)]  # | is special in re, needs escaping


data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=sf,  # need to have a space
)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)


which also gives exactly the output you expect
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2020-12-01 06:19
              
            
            
                                                                       
import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
    'literal':literal_unicode(
        u'by hjw              ___\n'
         '   __              /.-.\\\n'
         '  /  )_____________\\\\  Y\n'
         ' /_ /=== == === === =\\ _\\_\n'
         '( /)=== == === === == Y   \\\n'
         ' `-------------------(  o  )\n'
         '                      \\___/\n'),
    'folded': folded_unicode(
        u'It removes all ordinary curses from all equipped items. '
        'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)


The result:

folded: >
  It removes all ordinary curses from all equipped items. Heavy or permanent curses
  are unaffected.
literal: |
  by hjw              ___
     __              /.-.\
    /  )_____________\\  Y
   /_ /=== == === === =\ _\_
  ( /)=== == === === == Y   \
   `-------------------(  o  )
                        \___/


For completeness, one should also have str implementations, but I'm going to be lazy :-)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复

Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

Using `Representer.add_representer`

Using `default_style`

Caveats:

1) non-printable characters

2) leading and trailing white spaces