I've got an object with a short string attribute, and a long multi-line string attribute. I want to write the short string as a YAML quoted scalar, and the multi-line string as a literal scalar:
my_obj.short = "Hello"
my_obj.long = "Line1\nLine2\nLine3"
I'd like the YAML to look like this:
short: "Hello"
long: |
Line1
Line2
Line3
How can I instruct PyYAML to do this? If I call yaml.dump(my_obj)
, it produces a dict-like output:
{long: 'line1
line2
line3
', short: Hello}
(Not sure why long is double-spaced like that...)
Can I dictate to PyYAML how to treat my attributes? I'd like to affect both the order and style.
import yaml
from collections import OrderedDict
class quoted(str):
pass
def quoted_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)
class literal(str):
pass
def literal_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)
def ordered_dict_presenter(dumper, data):
return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)
d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))
print(yaml.dump(d))
Output
short: "Hello"
long: |
Line1
Line2
Line3
Falling in love with @lbt's approach, I got this code:
import yaml
def str_presenter(dumper, data):
if len(data.splitlines()) > 1: # check for multiline string
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
It makes every multiline string be a block literal.
I was trying to avoid the monkey patching part. Full credit to @lbt and @J.F.Sebastian.
I wanted any input with a \n
in it to be a block literal. Using the code in yaml/representer.py
as a base I got:
# -*- coding: utf-8 -*-
import yaml
def should_use_block(value):
for c in u"\u000a\u000d\u001c\u001d\u001e\u0085\u2028\u2029":
if c in value:
return True
return False
def my_represent_scalar(self, tag, value, style=None):
if style is None:
if should_use_block(value):
style='|'
else:
style = self.default_style
node = yaml.representer.ScalarNode(tag, value, style=style)
if self.alias_key is not None:
self.represented_objects[self.alias_key] = node
return node
a={'short': "Hello", 'multiline': """Line1
Line2
Line3
""", 'multiline-unicode': u"""Lêne1
Lêne2
Lêne3
"""}
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
yaml.representer.BaseRepresenter.represent_scalar = my_represent_scalar
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
Output
{multiline: 'Line1
Line2
Line3
', multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n", short: Hello}
{multiline: 'Line1
Line2
Line3
', multiline-unicode: 'Lêne1
Lêne2
Lêne3
', short: Hello}
After override
multiline: |
Line1
Line2
Line3
multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n"
short: Hello
multiline: |
Line1
Line2
Line3
multiline-unicode: |
Lêne1
Lêne2
Lêne3
short: Hello
You can use ruamel.yaml
and its RoundTripLoader/Dumper (disclaimer: I am the author of that package) apart from doing what you want, it supports the YAML 1.2 specification (from 2009), and has several other improvements:
import sys
from ruamel.yaml import YAML
yaml_str = """\
short: "Hello" # does keep the quotes, but need to tell the loader
long: |
Line1
Line2
Line3
folded: >
some like
explicit folding
of scalars
for readability
"""
yaml = YAML()
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
gives:
short: "Hello" # does keep the quotes, but need to tell the loader
long: |
Line1
Line2
Line3
folded: >
some like
explicit folding
of scalars
for readability
(including the comment, starting in the same column as before)
You can also create this output starting from scratch, but then you do need to provide the extra information e.g. the explicit positions on where to fold.
来源:https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data