问题
It looks as though PyYAML interprets the string 10:01 as a duration in seconds:
import yaml
>>> yaml.load("time: 10:01")
{'time': 601}
The official documentation does not reflect that: PyYAML documentation
Any suggestion how to read 10:01 as a string?
回答1:
Put it in quotes:
>>> import yaml
>>> yaml.load('time: "10:01"')
{'time': '10:01'}
This tells YAML that it is a literal string, and inhibits attempts to treat it as a numeric value.
回答2:
Since you are using a parser for YAML 1.1, you should expect what is indicated in the specification (example 2.19) to be implemented:
sexagesimal: 3:25:45
The sexagesimals are further explained here:
Using “:” allows expressing integers in base 60, which is convenient for time and angle values.
Not every detail that is implemented in PyYAML is in the documentation that you refer to, you should only see that as an introduction.
You are not the only one that found this interpretation confusing, and in YAML 1.2 sexagesimals were dropped from the specification. Although that specification has been out for about eight years, the changes have never been implemented in PyYAML.
The easiest way to solve this is to upgrade to ruamel.yaml (disclaimer: I am the author of that package), you'll get the YAML 1.2 behaviour (unless you explicitly specify you want to use YAML 1.1) that interprets 10:01
as a string:
from ruamel import yaml
import warnings
warnings.simplefilter('ignore', yaml.error.UnsafeLoaderWarning)
data = yaml.load("time: 10:01")
print(data)
which gives:
{'time': '10:01'}
The warnings.filter is only necessary because you use .load()
instead of .safe_load()
. The former is unsafe and can lead to a wiped disk, or worse, when used on uncontrolled YAML input. There is seldom a reason not to use .safe_load()
.
回答3:
If you wish to monkeypatch the pyyaml library so it does not have this behavior (since there is no neat way to do this), for a resolver of your choice, the code below works. The problem is that the regex that is used for int includes some code to match timestamps even though it looks like there's no spec for this behavior, it was just deemed as a "good practice" for strings like 30:00
or 40:11:11:11:11
to be treated as integers.
import yaml
import re
def partition_list(somelist, predicate):
truelist = []
falselist = []
for item in somelist:
if predicate(item):
truelist.append(item)
else:
falselist.append(item)
return truelist, falselist
@classmethod
def init_implicit_resolvers(cls):
"""
creates own copy of yaml_implicit_resolvers from superclass
code taken from add_implicit_resolvers; this should be refactored elsewhere
"""
if not 'yaml_implicit_resolvers' in cls.__dict__:
implicit_resolvers = {}
for key in cls.yaml_implicit_resolvers:
implicit_resolvers[key] = cls.yaml_implicit_resolvers[key][:]
cls.yaml_implicit_resolvers = implicit_resolvers
@classmethod
def remove_implicit_resolver(cls, tag, verbose=False):
cls.init_implicit_resolvers()
removed = {}
for key in cls.yaml_implicit_resolvers:
v = cls.yaml_implicit_resolvers[key]
vremoved, v2 = partition_list(v, lambda x: x[0] == tag)
if vremoved:
cls.yaml_implicit_resolvers[key] = v2
removed[key] = vremoved
return removed
@classmethod
def _monkeypatch_fix_int_no_timestamp(cls):
bad = '|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+'
for key in cls.yaml_implicit_resolvers:
v = cls.yaml_implicit_resolvers[key]
vcopy = v[:]
n = 0
for k in xrange(len(v)):
if v[k][0] == 'tag:yaml.org,2002:int' and bad in v[k][1].pattern:
n += 1
p = v[k][1]
p2 = re.compile(p.pattern.replace(bad,''), p.flags)
vcopy[k] = (v[k][0], p2)
if n > 0:
cls.yaml_implicit_resolvers[key] = vcopy
yaml.resolver.Resolver.init_implicit_resolvers = init_implicit_resolvers
yaml.resolver.Resolver.remove_implicit_resolver = remove_implicit_resolver
yaml.resolver.Resolver._monkeypatch_fix_int_no_timestamp = _monkeypatch_fix_int_no_timestamp
Then if you do this:
class MyResolver(yaml.resolver.Resolver):
pass
t1 = MyResolver.remove_implicit_resolver('tag:yaml.org,2002:timestamp')
MyResolver._monkeypatch_fix_int_no_timestamp()
class MyLoader(yaml.SafeLoader, MyResolver):
pass
text = '''
a: 3
b: 30:00
c: 30z
d: 40:11:11:11
'''
print yaml.safe_load(text)
print yaml.load(text, Loader=MyLoader)
then it prints
{'a': 3, 'c': '30z', 'b': 1800, 'd': 8680271}
{'a': 3, 'c': '30z', 'b': '30:00', 'd': '40:11:11:11'}
showing that the default yaml behavior has been left unchanged but your private loader class handles these strings sanely.
来源:https://stackoverflow.com/questions/28835322/pyyaml-interprets-string-as-timestamp