Python 正则表达式举例：re.match与re.findall区别

re.match与re.findall区别：

match是匹配一次 ,findall 是匹配所有

match的返回可以带group

两个方法的具体参数：

re.match(pattern, string, flags=0)

参数：pattern：匹配的正则表达式；string：要匹配的字符串；flags：标志位，用于控制正则表达式的匹配方式，如：是否区分大小写，多行匹配等等。

用法：匹配以xxx开头的字符串，匹配成功就返回对象，否则返回None

findall(string[, pos[, endpos]])

参数：string : 待匹配的字符串；pos : 可选参数，指定字符串的起始位置，默认为 0；endpos : 可选参数，指定字符串的结束位置，默认为字符串的长度。

用法：在字符串中找到正则表达式所匹配的所有子串，并返回一个列表，如果没有找到匹配的，则返回空列表。

举例：

1.设计一个读取字符串中的电话号码和邮件的程序。

程序开头内容：

import re
txt='''
Alice's phone number is 13300001234,telphone number is 0731-833334444,and her email is alice_love@gmail.com.
Bob's phone number is (086)13300001234,telphone number is (0731)-81112222,and her email is bob.a@gmail.com.
    '''

2.设计电话号码读取正则表达式，（）用来分组，以便re.match可以返回group

phonenum_regex=re.compile(r'''          #r表示传入原始字符
\(?                                     #有或者没有括号
(\d{4})                                 #4个数字作为区号
\)?                                     #有或者没有括号
(\s|-|\.)?                              #分隔符，空格、‘-’或者‘.’，可有可无
(\d{8})                                 #8个数字作为电话号码
''',re.VERBOSE)                         #re.VERBOSE增加可读性，忽略空格和 # 后面的注释

3.查看re.match和re.findall的区别

re.findall是查出所有的匹配，返回的是列表，列表的值可能是元组（有分组的情况下才会是元组，没有分组的情况见下面邮箱提取）；

re.search的group默认值是0，返回的是字符串，如果指定group的话返回的是各个分组合并的元组。

phonenum_find=phonenum_regex.findall(txt)#re.findall
print('打印findall结果',phonenum_find,'findall的类型',type(phonenum_find))
phonenum_search=phonenum_regex.search(txt)#re.search
print('打印search：',phonenum_search,'search的类型',type(phonenum_search))
print('打印search.groups：',phonenum_search.groups(),'search.groups的类型',type(phonenum_search.groups()))
print('打印search.group：',phonenum_search.group(),'search.group的类型',type(phonenum_search.group()))
print('打印search.group0：',phonenum_search.group(0),'search.group0的类型',type(phonenum_search.group(0)))
print('打印search.group1,2：',phonenum_search.group(1,2),'search.group1,2的类型',type(phonenum_search.group(1,2)))

#结果：

打印findall结果 [('0731', '-', '83333444'), ('0731', '-', '81112222')] findall的类型 <class 'list'>

打印search： <re.Match object; span=(56, 69), match='0731-83333444'> search的类型 <class 're.Match'>

打印search.groups： ('0731', '-', '83333444') search.groups的类型 <class 'tuple'>

打印search.group： 0731-83333444 search.group的类型 <class 'str'>

打印search.group0： 0731-83333444 search.group0的类型 <class 'str'>

打印search.group1,2： ('0731', '-') search.group1,2的类型 <class 'tuple'>

从结果上发现：findall输出的是列表，search.groups输出的是元组，search.group等同于search.group(0)，输出的是字符串，search.group输出多个组时是字符串组成的元组。

4.findall的元组转换为列表

phonenum_list=[]
for groups in phonenum_find:
    g_list=[]
    for g in groups:
        g_list.append(g)
    phonenum_list.append(''.join(g_list))
print('打印find转换后的list',phonenum_list)

#结果

打印find转换后的list ['0731-83333444', '0731-81112222']

5.邮箱提取的正则表达式，没有用到分组

email_regex=re.compile(r'''             #r表示传入原始字符
[\w\d.-]+                               #邮箱前缀，可能是任意长度的字母，数字，和点
@                                       #@字符
[\w\d.-]+                               #域的名字比如hotmail
\.[\w]{2,4} |  \.[\w]{2,4}\.[\w]{2,4}   #后缀,比如com或者.com.cn 
''',re.VERBOSE)                         #re.VERBOSE增加可读性，忽略空格和 # 后面的注释

6.查看re.findall和re.search的区别

phonenum_find=phonenum_regex.findall(txt)
email_find=email_regex.findall(txt)
email_search=email_regex.search(txt)
print('打印邮箱find',email_find)
print('打印邮箱search.group',email_search.group())
print('打印邮箱search.group0',email_search.group(0))
print('打印邮箱search.groups',email_search.groups())
print('打印邮箱search.group1',email_search.group(1))#会出错

结果

打印邮箱find ['alice_love@gmail.com', 'bob.a@gmail.com']

打印邮箱search.group alice_love@gmail.com

打印邮箱search.group0 alice_love@gmail.com

打印邮箱search.groups ()

打印search.groups会提示IndexError: no such group，因为没有组

整个程序代码：

import re
#正则匹配的文本
txt='''
Alice's phone number is 13300001234,telphone number is 0731-833334444,and her email is alice_love@gmail.com.
Bob's phone number is (086)13300001234,telphone number is (0731)-81112222,and her email is bob.a@gmail.com.
    '''
#电话号码正则表达式
phonenum_regex=re.compile(r'''          #r表示传入原始字符
\(?                                     #有或者没有括号
(\d{4})                                 #4个数字作为区号
\)?                                     #有或者没有括号
(\s|-|\.)?                              #分隔符，空格、‘-’或者‘.’，可有可无
(\d{8})                                 #8个数字作为电话号码
''',re.VERBOSE)                         #re.VERBOSE增加可读性，忽略空格和 # 后面的注释
#re.find和re.search
phonenum_find=phonenum_regex.findall(txt)#re.findall
print('打印findall结果',phonenum_find,'findall的类型',type(phonenum_find))
phonenum_search=phonenum_regex.search(txt)#re.search
print('打印search：',phonenum_search,'search的类型',type(phonenum_search))
print('打印search.groups：',phonenum_search.groups(),'search.groups的类型',type(phonenum_search.groups()))
print('打印search.group：',phonenum_search.group(),'search.group的类型',type(phonenum_search.group()))
print('打印search.group0：',phonenum_search.group(0),'search.group0的类型',type(phonenum_search.group(0)))
print('打印search.group1,2：',phonenum_search.group(1,2),'search.group1,2的类型',type(phonenum_search.group(1,2)))
phonenum_list=[]
#转换findall的结果为list
for groups in phonenum_find:
    g_list=[]
    for g in groups:
        g_list.append(g)
    phonenum_list.append(''.join(g_list))
print('打印find转换后的list',phonenum_list)
#邮件的提取正则表达式
email_regex=re.compile(r'''             #r表示传入原始字符
[\w\d.-]+                               #邮箱前缀，可能是任意长度的字母，数字，和点
@                                       #@字符
[\w\d.-]+                               #域的名字比如hotmail
\.[\w]{2,4} |  \.[\w]{2,4}\.[\w]{2,4}   #后缀,比如com或者.com.cn 
''',re.VERBOSE)                         #re.VERBOSE增加可读性，忽略空格和 # 后面的注释
#re.find和re.search
phonenum_find=phonenum_regex.findall(txt)
email_find=email_regex.findall(txt)
email_search=email_regex.search(txt)
print('打印邮箱find',email_find)
print('打印邮箱search.group',email_search.group())
print('打印邮箱search.group0',email_search.group(0))
print('打印邮箱search.groups',email_search.groups())
print('打印邮箱search.group1',email_search.group(1))

来源：oschina

链接：https://my.oschina.net/u/4282343/blog/4694997

标签

regex

Alice