Python extract pattern matches

前端 未结 9 1496
小蘑菇
小蘑菇 2020-11-22 06:36

Python 2.7.1 I am trying to use python regular expression to extract words inside of a pattern

I have some string that looks like this

someline abc
s         


        
相关标签:
9条回答
  • 2020-11-22 07:07

    You can use groups (indicated with '(' and ')') to capture parts of the string. The match object's group() method then gives you the group's contents:

    >>> import re
    >>> s = 'name my_user_name is valid'
    >>> match = re.search('name (.*) is valid', s)
    >>> match.group(0)  # the entire match
    'name my_user_name is valid'
    >>> match.group(1)  # the first parenthesized subgroup
    'my_user_name'
    

    In Python 3.6+ you can also index into a match object instead of using group():

    >>> match[0]  # the entire match 
    'name my_user_name is valid'
    >>> match[1]  # the first parenthesized subgroup
    'my_user_name'
    
    0 讨论(0)
  • You can also use a capture group (?P<user>pattern) and access the group like a dictionary match['user'].

    string = '''someline abc\n
                someother line\n
                name my_user_name is valid\n
                some more lines\n'''
    
    pattern = r'name (?P<user>.*) is valid'
    matches = re.search(pattern, str(string), re.DOTALL)
    print(matches['user'])
    
    # my_user_name
    
    0 讨论(0)
  • 2020-11-22 07:12

    You can use matching groups:

    p = re.compile('name (.*) is valid')
    

    e.g.

    >>> import re
    >>> p = re.compile('name (.*) is valid')
    >>> s = """
    ... someline abc
    ... someother line
    ... name my_user_name is valid
    ... some more lines"""
    >>> p.findall(s)
    ['my_user_name']
    

    Here I use re.findall rather than re.search to get all instances of my_user_name. Using re.search, you'd need to get the data from the group on the match object:

    >>> p.search(s)   #gives a match object or None if no match is found
    <_sre.SRE_Match object at 0xf5c60>
    >>> p.search(s).group() #entire string that matched
    'name my_user_name is valid'
    >>> p.search(s).group(1) #first group that match in the string that matched
    'my_user_name'
    

    As mentioned in the comments, you might want to make your regex non-greedy:

    p = re.compile('name (.*?) is valid')
    

    to only pick up the stuff between 'name ' and the next ' is valid' (rather than allowing your regex to pick up other ' is valid' in your group.

    0 讨论(0)
  • 2020-11-22 07:13

    You need to capture from regex. search for the pattern, if found, retrieve the string using group(index). Assuming valid checks are performed:

    >>> p = re.compile("name (.*) is valid")
    >>> result = p.search(s)
    >>> result
    <_sre.SRE_Match object at 0x10555e738>
    >>> result.group(1)     # group(1) will return the 1st capture (stuff within the brackets).
                            # group(0) will returned the entire matched text.
    'my_user_name'
    
    0 讨论(0)
  • 2020-11-22 07:14

    You could use something like this:

    import re
    s = #that big string
    # the parenthesis create a group with what was matched
    # and '\w' matches only alphanumeric charactes
    p = re.compile("name +(\w+) +is valid", re.flags)
    # use search(), so the match doesn't have to happen 
    # at the beginning of "big string"
    m = p.search(s)
    # search() returns a Match object with information about what was matched
    if m:
        name = m.group(1)
    else:
        raise Exception('name not found')
    
    0 讨论(0)
  • 2020-11-22 07:15

    Here's a way to do it without using groups (Python 3.6 or above):

    >>> re.search('2\d\d\d[01]\d[0-3]\d', 'report_20191207.xml')[0]
    '20191207'
    
    0 讨论(0)
提交回复
热议问题