How can I control what scalar form PyYAML uses for my data?

流过昼夜 提交于 2019-11-27 01:27:41
jfs

Based on Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

import yaml
from collections import OrderedDict

class quoted(str):
    pass

def quoted_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)

class literal(str):
    pass

def literal_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)

def ordered_dict_presenter(dumper, data):
    return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)

d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))

print(yaml.dump(d))

Output

short: "Hello"
long: |
  Line1
  Line2
  Line3

Falling in love with @lbt's approach, I got this code:

import yaml

def str_presenter(dumper, data):
  if len(data.splitlines()) > 1:  # check for multiline string
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)

It makes every multiline string be a block literal.

I was trying to avoid the monkey patching part. Full credit to @lbt and @J.F.Sebastian.

I wanted any input with a \n in it to be a block literal. Using the code in yaml/representer.py as a base I got:

# -*- coding: utf-8 -*-
import yaml

def should_use_block(value):
    for c in u"\u000a\u000d\u001c\u001d\u001e\u0085\u2028\u2029":
        if c in value:
            return True
    return False

def my_represent_scalar(self, tag, value, style=None):
    if style is None:
        if should_use_block(value):
             style='|'
        else:
            style = self.default_style

    node = yaml.representer.ScalarNode(tag, value, style=style)
    if self.alias_key is not None:
        self.represented_objects[self.alias_key] = node
    return node


a={'short': "Hello", 'multiline': """Line1
Line2
Line3
""", 'multiline-unicode': u"""Lêne1
Lêne2
Lêne3
"""}

print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
yaml.representer.BaseRepresenter.represent_scalar = my_represent_scalar
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))

Output

{multiline: 'Line1

    Line2

    Line3

    ', multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n", short: Hello}

{multiline: 'Line1

    Line2

    Line3

    ', multiline-unicode: 'Lêne1

    Lêne2

    Lêne3

    ', short: Hello}

After override

multiline: |
  Line1
  Line2
  Line3
multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n"
short: Hello

multiline: |
  Line1
  Line2
  Line3
multiline-unicode: |
  Lêne1
  Lêne2
  Lêne3
short: Hello

You can use ruamel.yaml and its RoundTripLoader/Dumper (disclaimer: I am the author of that package) apart from doing what you want, it supports the YAML 1.2 specification (from 2009), and has several other improvements:

import sys
from ruamel.yaml import YAML

yaml_str = """\
short: "Hello"  # does keep the quotes, but need to tell the loader
long: |
  Line1
  Line2
  Line3
folded: >
  some like
  explicit folding
  of scalars
  for readability
"""

yaml = YAML()
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

gives:

short: "Hello"  # does keep the quotes, but need to tell the loader
long: |
  Line1
  Line2
  Line3
folded: >
  some like
  explicit folding
  of scalars
  for readability

(including the comment, starting in the same column as before)

You can also create this output starting from scratch, but then you do need to provide the extra information e.g. the explicit positions on where to fold.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!