PyYAML interprets string as timestamp

血红的双手。 提交于 2019-12-11 03:11:41

问题


It looks as though PyYAML interprets the string 10:01 as a duration in seconds:

import yaml
>>> yaml.load("time: 10:01")
{'time': 601}

The official documentation does not reflect that: PyYAML documentation

Any suggestion how to read 10:01 as a string?


回答1:


Put it in quotes:

>>> import yaml
>>> yaml.load('time: "10:01"')
{'time': '10:01'}

This tells YAML that it is a literal string, and inhibits attempts to treat it as a numeric value.




回答2:


Since you are using a parser for YAML 1.1, you should expect what is indicated in the specification (example 2.19) to be implemented:

sexagesimal: 3:25:45

The sexagesimals are further explained here:

Using “:” allows expressing integers in base 60, which is convenient for time and angle values.

Not every detail that is implemented in PyYAML is in the documentation that you refer to, you should only see that as an introduction.


You are not the only one that found this interpretation confusing, and in YAML 1.2 sexagesimals were dropped from the specification. Although that specification has been out for about eight years, the changes have never been implemented in PyYAML.

The easiest way to solve this is to upgrade to ruamel.yaml (disclaimer: I am the author of that package), you'll get the YAML 1.2 behaviour (unless you explicitly specify you want to use YAML 1.1) that interprets 10:01 as a string:

from ruamel import yaml

import warnings
warnings.simplefilter('ignore', yaml.error.UnsafeLoaderWarning)

data = yaml.load("time: 10:01")
print(data)

which gives:

{'time': '10:01'}

The warnings.filter is only necessary because you use .load() instead of .safe_load(). The former is unsafe and can lead to a wiped disk, or worse, when used on uncontrolled YAML input. There is seldom a reason not to use .safe_load().




回答3:


If you wish to monkeypatch the pyyaml library so it does not have this behavior (since there is no neat way to do this), for a resolver of your choice, the code below works. The problem is that the regex that is used for int includes some code to match timestamps even though it looks like there's no spec for this behavior, it was just deemed as a "good practice" for strings like 30:00 or 40:11:11:11:11 to be treated as integers.

import yaml
import re

def partition_list(somelist, predicate):
    truelist = []
    falselist = []
    for item in somelist:
        if predicate(item):
            truelist.append(item)
        else:
            falselist.append(item)
    return truelist, falselist

@classmethod
def init_implicit_resolvers(cls):
    """ 
    creates own copy of yaml_implicit_resolvers from superclass
    code taken from add_implicit_resolvers; this should be refactored elsewhere
    """
    if not 'yaml_implicit_resolvers' in cls.__dict__:
        implicit_resolvers = {}
        for key in cls.yaml_implicit_resolvers:
            implicit_resolvers[key] = cls.yaml_implicit_resolvers[key][:]
        cls.yaml_implicit_resolvers = implicit_resolvers

@classmethod
def remove_implicit_resolver(cls, tag, verbose=False):
    cls.init_implicit_resolvers()
    removed = {}
    for key in cls.yaml_implicit_resolvers:
        v = cls.yaml_implicit_resolvers[key]
        vremoved, v2 = partition_list(v, lambda x: x[0] == tag)
        if vremoved:
            cls.yaml_implicit_resolvers[key] = v2
            removed[key] = vremoved
    return removed

@classmethod
def _monkeypatch_fix_int_no_timestamp(cls):
    bad = '|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+'
    for key in cls.yaml_implicit_resolvers:
        v = cls.yaml_implicit_resolvers[key]
        vcopy = v[:]
        n = 0
        for k in xrange(len(v)):
            if v[k][0] == 'tag:yaml.org,2002:int' and bad in v[k][1].pattern:
                n += 1
                p = v[k][1]
                p2 = re.compile(p.pattern.replace(bad,''), p.flags)
                vcopy[k] = (v[k][0], p2)    
        if n > 0:
            cls.yaml_implicit_resolvers[key] = vcopy

yaml.resolver.Resolver.init_implicit_resolvers = init_implicit_resolvers
yaml.resolver.Resolver.remove_implicit_resolver = remove_implicit_resolver
yaml.resolver.Resolver._monkeypatch_fix_int_no_timestamp = _monkeypatch_fix_int_no_timestamp

Then if you do this:

class MyResolver(yaml.resolver.Resolver):
    pass

t1 = MyResolver.remove_implicit_resolver('tag:yaml.org,2002:timestamp')
MyResolver._monkeypatch_fix_int_no_timestamp()

class MyLoader(yaml.SafeLoader, MyResolver):
    pass

text = '''
a: 3
b: 30:00
c: 30z
d: 40:11:11:11
'''

print yaml.safe_load(text)
print yaml.load(text, Loader=MyLoader)

then it prints

{'a': 3, 'c': '30z', 'b': 1800, 'd': 8680271}
{'a': 3, 'c': '30z', 'b': '30:00', 'd': '40:11:11:11'}

showing that the default yaml behavior has been left unchanged but your private loader class handles these strings sanely.



来源:https://stackoverflow.com/questions/28835322/pyyaml-interprets-string-as-timestamp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!