Regex to match key in YAML

前端 未结 3 1622
太阳男子
太阳男子 2020-12-21 17:40

I have a yaml which looks like this..! User can define N number of xyz_flovor_id where _flovor_id key will be common. Aim is to grab *_flavor

相关标签:
3条回答
  • 2020-12-21 18:23

    You get that error, because the value for the key server is not a string, but a dict (or a subclass of dict). That is what the YAML mapping in your input, which includes the key abc_flavor_id, is loaded as.

    Apart from that it is always a bad idea to use regular expressions to parse YAML (or any other structured text format like HTML, XML, CVS), as it is difficult, if not impossible, to capture all nuance of the grammar. If it wasn't you would not need a parser.

    E.g a minor change to the file, just adding a comment on which value needs updating for some user editing the file, breaks the simplistic regular expression approaches:

    server:
      tenant: "admin"
      availability_zone: "nova"
      cpu_overcommit_ratio: 1:1
      memory_overcommit_ratio: 1:1
      xyz_flovor_id: 1
      abc_flavor_id:  # extract the value for this key
        2
    

    This YAML documenta above, is semantically identical to yours, but will no longer work with the currently posted other answers.

    If some YAML load/save operation transforms your input into (again semantically equivalent):

    server: {abc_flavor_id: 2, availability_zone: nova,
      cpu_overcommit_ratio: 61, memory_overcommit_ratio: 61,
      tenant: admin, xyz_flovor_id: 1} then tweaking a dumb regular expression will not begin to suffice (this is not a construed example, this is the default way to dump your data structure in PyYAML and in ruamel.yaml using 'safe'-mode).
    

    What you need to do, is regular expression match the keys of the value associated with server, not the whole document:

    import re
    import sys
    from ruamel.yaml import YAML
    
    yaml_str = """\
    server:
      tenant: "admin"
      availability_zone: "nova"
      cpu_overcommit_ratio: 1:1
      memory_overcommit_ratio: 1:1
      xyz_flovor_id: 1
      abc_flavor_id:  # extract the value for this key
        2
    """
    
    def get_flavor_keys(params):
        pattern = re.compile(r'(?P<key>.*)_flavor_id')
        ret_val = {}
        for key in params['server']:
            m = pattern.match(key)
            if m is not None:
                ret_val[m.group('key')] = params['server'][key]
                print('test', m.group('key'))
        return ret_val
    
    yaml = YAML(typ='safe')
    data = yaml.load(yaml_str)
    keys = get_flavor_keys(data)
    print(keys)
    

    this gives you:

    {'abc': 2}
    

    ( the xyz_flovor_id of course doesn't match, but maybe that is a typo in your post).

    0 讨论(0)
  • 2020-12-21 18:32

    You can use this regex:

    \b[^_\n]+_flavor_id:\s*(\d+)

    Click for Demo

    Regex Explanation:

    • \b - word boundary
    • [^_\n]+ - 1+ occurrences of any character which is not an _ nor a newline character
    • _flavor_id: - matches _flavor_id: literally
    • \s* - matches 0+ occurences of a white space character
    • (\d+) - matches and captures 1+ digits. This is the value that you needed.

    I am not well versed with python but regex101 allows us to generate the code. So, I am pasting the code here which you can use.

    import re
    
    regex = r"\b[^_\n]+_flavor_id:\s*(\d+)"
    
    test_str = ("server:\n"
        "    tenant: \"admin\"\n"
        "    availability_zone: \"nova\"\n"
        "    cpu_overcommit_ratio: 1:1\n"
        "    memory_overcommit_ratio: 1:1\n"
        "    xyz_flavor_id: 1\n"
        "    abc_flavor_id: 2")
    
    matches = re.finditer(regex, test_str)
    
    for matchNum, match in enumerate(matches):
        matchNum = matchNum + 1
    
        print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
        for groupNum in range(0, len(match.groups())):
            groupNum = groupNum + 1
    
            print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
    

    This is the output I got:

    0 讨论(0)
  • 2020-12-21 18:34

    You need this regex. I grouped it to key-value pair:

    ^\s*(?P<key>\w+_flavor_id):\s*(?P<value>\d+)
    

    Python demo: https://repl.it/Lk5W/0

    import re
    
    regex = r"^\s*(?P<key>\w+_flavor_id):\s*(?P<value>\d+)"
    
    test_str = ("  server:\n"
        "    tenant: \"admin\"\n"
        "    availability_zone: \"nova\"\n"
        "    cpu_overcommit_ratio: 1:1\n"
        "    memory_overcommit_ratio: 1:1\n"
        "    xyz_flavor_id: 1\n"
        "    abc_flavor_id: 2\n")
    
    matches = re.finditer(regex, test_str, re.MULTILINE)
    
    for matchNum, match in enumerate(matches):
        print ("{key}:{value}".format(key = match.group('key'), value=match.group('value')))
    
    0 讨论(0)
提交回复
热议问题