I have a yaml which looks like this..! User can define N number of xyz_flovor_id
where _flovor_id
key will be common. Aim is to grab *_flavor
You get that error, because the value for the key server
is not a string, but a dict (or a subclass of dict). That is what the YAML mapping in your input, which includes the key abc_flavor_id
, is loaded as.
Apart from that it is always a bad idea to use regular expressions to parse YAML (or any other structured text format like HTML, XML, CVS), as it is difficult, if not impossible, to capture all nuance of the grammar. If it wasn't you would not need a parser.
E.g a minor change to the file, just adding a comment on which value needs updating for some user editing the file, breaks the simplistic regular expression approaches:
server:
tenant: "admin"
availability_zone: "nova"
cpu_overcommit_ratio: 1:1
memory_overcommit_ratio: 1:1
xyz_flovor_id: 1
abc_flavor_id: # extract the value for this key
2
This YAML documenta above, is semantically identical to yours, but will no longer work with the currently posted other answers.
If some YAML load/save operation transforms your input into (again semantically equivalent):
server: {abc_flavor_id: 2, availability_zone: nova,
cpu_overcommit_ratio: 61, memory_overcommit_ratio: 61,
tenant: admin, xyz_flovor_id: 1} then tweaking a dumb regular expression will not begin to suffice (this is not a construed example, this is the default way to dump your data structure in PyYAML and in ruamel.yaml using 'safe'-mode).
What you need to do, is regular expression match the keys of the value associated with server
, not the whole document:
import re
import sys
from ruamel.yaml import YAML
yaml_str = """\
server:
tenant: "admin"
availability_zone: "nova"
cpu_overcommit_ratio: 1:1
memory_overcommit_ratio: 1:1
xyz_flovor_id: 1
abc_flavor_id: # extract the value for this key
2
"""
def get_flavor_keys(params):
pattern = re.compile(r'(?P<key>.*)_flavor_id')
ret_val = {}
for key in params['server']:
m = pattern.match(key)
if m is not None:
ret_val[m.group('key')] = params['server'][key]
print('test', m.group('key'))
return ret_val
yaml = YAML(typ='safe')
data = yaml.load(yaml_str)
keys = get_flavor_keys(data)
print(keys)
this gives you:
{'abc': 2}
( the xyz_flovor_id
of course doesn't match, but maybe that is a typo in your post).
You can use this regex:
\b[^_\n]+_flavor_id:\s*(\d+)
Click for Demo
Regex Explanation:
\b
- word boundary[^_\n]+
- 1+ occurrences of any character which is not an _
nor a newline character_flavor_id:
- matches _flavor_id:
literally\s*
- matches 0+ occurences of a white space character(\d+)
- matches and captures 1+ digits. This is the value that you needed.I am not well versed with python but regex101 allows us to generate the code. So, I am pasting the code here which you can use.
import re
regex = r"\b[^_\n]+_flavor_id:\s*(\d+)"
test_str = ("server:\n"
" tenant: \"admin\"\n"
" availability_zone: \"nova\"\n"
" cpu_overcommit_ratio: 1:1\n"
" memory_overcommit_ratio: 1:1\n"
" xyz_flavor_id: 1\n"
" abc_flavor_id: 2")
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
This is the output I got:
You need this regex. I grouped it to key-value pair:
^\s*(?P<key>\w+_flavor_id):\s*(?P<value>\d+)
Python demo: https://repl.it/Lk5W/0
import re
regex = r"^\s*(?P<key>\w+_flavor_id):\s*(?P<value>\d+)"
test_str = (" server:\n"
" tenant: \"admin\"\n"
" availability_zone: \"nova\"\n"
" cpu_overcommit_ratio: 1:1\n"
" memory_overcommit_ratio: 1:1\n"
" xyz_flavor_id: 1\n"
" abc_flavor_id: 2\n")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches):
print ("{key}:{value}".format(key = match.group('key'), value=match.group('value')))