yaml.dump adding unwanted newlines in multiline strings

邮差的信 提交于 2020-03-18 12:33:41

问题


I have a multiline string:

>>> import credstash
>>> d = credstash.getSecret('alex_test_key', region='ap-southeast-2')

To see the raw data (first 162 characters):

>>> credstash.getSecret('alex_test_key', region='ap-southeast-2')[0:162]
u'-----BEGIN RSA PRIVATE KEY-----\nMIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx\nxk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45\n'

And:

>>> print d[0:162]                                                                                                                                                                                          
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45

I write to a YAML file:

>>> import yaml
>>> with open('foo.yaml', 'w') as f:                                                                                                                                                                        
...     yaml.dump(d, f, default_flow_style=False, explicit_start=True)
... 

Now it looks like this:

$ head -5 foo.yaml 
--- !!python/unicode '-----BEGIN RSA PRIVATE KEY-----

  MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx

  xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45

i.e. each line has two newlines.

Now if I read it back into a string I see that all is okay in the round-trip:

>>> with open('foo.yaml', 'r') as f:
...     d = yaml.load(f)
... 
>>> print d[0:162]
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45

(I don't understand why however.)

My real problem is that if humans read this YAML file they will probably assume, as I did, that my program has broken the formatting of the private key file.

Is there a way to use yaml.dump so as to output something without the additional newline characters?


回答1:


If that is the only thing going into your YAML file then you can dump with the option default_style='|' which gives you block style literal for all of your scalars (probably not what you want).

Your string, contains no special characters (that need \ escaping and double quotes), because of the newlines PyYAML decides to represented single quoted. In single quoted style a double newline is the way to represent a single newline that occurred in string that is represented. This gets "undone" on loading, but is indeed not very readable.

If you want to get the block style literals on an individual basis, you can do multiple things:

  • adapt the Representer to output all strings with embedded newlines using the literal scalar block style (assuming they don't need \ escaping of special characters, which will force double quotes)

    import sys
    import yaml
    
    x = u"""\
    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
    xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
    ...
    """
    
    yaml.SafeDumper.org_represent_str = yaml.SafeDumper.represent_str
    
    def repr_str(dumper, data):
        if '\n' in data:
            return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')
        return dumper.org_represent_str(data)
    
    yaml.add_representer(str, repr_str, Dumper=yaml.SafeDumper)
    
    yaml.safe_dump(dict(a=1, b='hello world', c=x), sys.stdout)
    
  • make a subclass of string, that has its special representer. You should be able to take the code for that from here, here and here:

    import sys
    import yaml
    
    class PSS(str):
        pass
    
    x = PSS("""\
    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
    xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
    ...
    """)
    
    def pss_representer(dumper, data):
            style = '|'
            # if sys.versioninfo < (3,) and not isinstance(data, unicode):
            #     data = unicode(data, 'ascii')
            tag = u'tag:yaml.org,2002:str'
            return dumper.represent_scalar(tag, data, style=style)
    
    yaml.add_representer(PSS, pss_representer, Dumper=yaml.SafeDumper)
    
    yaml.safe_dump(dict(a=1, b='hello world', c=x), sys.stdout)
    
  • use ruamel.yaml:

    import sys
    from ruamel.yaml import YAML
    from ruamel.yaml.scalarstring import PreservedScalarString as pss
    
    x = pss("""\
    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
    xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
    ...
    """)
    
    yaml = YAML()
    
    yaml.dump(dict(a=1, b='hello world', c=x), sys.stdout)
    

All of these give:

a: 1
b: hello world
c: |
  -----BEGIN RSA PRIVATE KEY-----
  MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
  xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
  ...

Please note that it is not necessary to specify default_flow_style=False as the literal scalars can only appear in block style.




回答2:


Adding onto Anthon's answer, I found some documentation for other options I can pass to default_style at this link here.

The best compromise I could find for representing all of my data was:

with open('foo.yaml', 'w') as f:
  yaml.safe_dump(secrets, f, explicit_start=True, default_style='\"', width=4096)

And this led to a YAML file that looks like:

---
"alex_test": "yyyyyyyy"
"alex_test_key": "-----BEGIN RSA PRIVATE KEY-----\nMIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx\nxk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45\ni6LLqO6aixyhvabSy7r1bP8QBrHWUIEZRerrw0TlhuKHDoFpmRAjAAIZ/5q9PSxg\n1yCwTVMlMvBiRksPsKi0fcA/v8G+yqFBL7IeaNCPSoa/3ZdgPWbh9P69DyOlB97a\nh1+0Jmh1gtAhyiz1/hmiN7LAclKHyOOnTEEyIMJioZqJURshKdF85RKILgw2X8Lp\n78mO5VyvvGxo3BNjVr0BOrSJ3t17ugijROx3HwIBIwKCAQAUGq1uvLxGnUErgvQg\ncbk/3kcVJutAJNVXM45eNd05ygpg30JwFUWJMMwBnMch8rjz+NtMvDTpcMT2oRDM\nYn9K4u/VxfXj55kLRsuhgesYJ1vFfu79VxjFVkfCx/CbOi9TSooQqCXx8fxtTOTo\nvF1Z4VWAlxLj/HbD+hGg6jy+Iwq/8HWsHN/VFPqhNqdKvzXGOtyynSZBOUf7upMX\nPh4REE4hYMZwdDnl+NRNmm8XA9TOE+Uf8WLDooKcXjp70CES0ehiC+VD0wG5JEVQ\nbZmDTdBxPcQsO31sNwRwUIX0J4K4Z9npa3dJdRqXJuof48RLzSGwM42eJzmTRNSw\n6I77AoGBAP9LO2A2ZAD7LJBKe48GE8wzkgaQd9vc3RImwrMAXPMVP9wdKR4m/X73\ngWxQ1QbueTtBRaNwkF8l9+Iham3H3kAbBONsbvJIO9Co0n1k+S9mutO1ZWfTMWZp\nIfMz2lncLonxXCXnDndzXtTjcqHeZFmSmDZZZugPXYWtC5N2ic3pAoGBAOsypk5z\na9FG3H46TIjYKyV0Z/R0Hvrp8w+AXdogKyHh0nj9Sevr+JMgOR4ayqYUKGG3sRtM\nzyoWCJ+Wb7Rd0olc2SeouQYSzk2wFKvnnq5o0Q8YZIYkiQN82FXoN2jcELdcVdW6\n1VJuUk9K3nDe+Gz6dkHZnthFC6usL15pHs/HAoGBAIqWjfJmq1D9YVWkxrtbEg/E\nOVQFSGFpRM9W3rjxkYtGDLlRqJtW/qQCs/j4rihVkkS9CIvspiUF+5gDgusjW2Sg\n9AZuEFejji9xltZbYrNVBlWrnXLgXKVPA80qxv2UyM6KVpg7miOWZq4VElffIIhl\nhdRcaxBC2v9skUFsPC3zAoGBAN3CCoR7dEj5q1Jxe1x0C2x1EY62oN3yhhXuD1ih\n/MgssIC0TQMDDvEeYb1Mde0LsQutMfUrKbn3hHk2EYzNfVzxJIR6gpCypUHvKW7h\nst71HOJY087vP1sPT6F0jAPILQSnhCFJwdFgtAGeXLOQZpKjAckPA3t0TNUQD2ek\n8SpNAoGAfQrNfepCTbc/9BCv/sJLLMEdlB/PyzenucBeXKfsfSU6+hYM14+gLp7+\nmOgoaM7F4UkqzJTRDQJnYo1NowRHjs0xHJoQoXzlV43ZkCmTwKtZ/9APLi060Md1\n+fDJX+yvxnZsY5hw6cYwC3C/axS9jq63oQ7i8FXwG/a0breCGu8=\n-----END RSA PRIVATE KEY-----"

I would have preferred to use ruamel.yaml, but this code must run in an environment where I can only use the default Python packages that I have available.



来源:https://stackoverflow.com/questions/45004464/yaml-dump-adding-unwanted-newlines-in-multiline-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!