Unable to extract date of birth from a given format

大城市里の小女人 提交于 2019-12-23 03:17:41

问题


I have a set of text files from which I have to extract date of birth. The below code is able to extract date of birth from most of the files but is getting failed when given in the below format. May I know how could I extract DOB? The data is very much un-uniform.

Data:

data="""
Thomas, John - DOB/Sex:    12/23/1955                                     11/15/2014   11:53 AM"
Jacob's Date of birth is 9/15/1963
Name:Annie; DOB:10/30/1970

Code:

import re    
pattern = re.compile(r'.*DOB.*((?:\d{1,2})(?:(?:\/|-)\d{1,2})(?(?:\/|-)\d{2,4})).*',re.I)

matches=pattern.findall(data)

for match in matches:
    print(match)

expected output:

12/23/1955

回答1:


import re    

data="""
Thomas, John - DOB/Sex:    12/23/1955                                     11/15/2014   11:53 AM"
Jacob's Date of birth is 9/15/1963
Name:Annie; DOB:10/30/1970
"""

pattern = re.compile(r'.*?\b(?:DOB|Date of birth)\b.*?(\d{1,2}[/-]\d{1,2}[/-](?:\d\d){1,2})',re.I)

matches=pattern.findall(data)

for match in matches:
    print(match)    

Output:

12/23/1955
9/15/1963
10/30/1970

Explanation:

.*?             : 0 or more anycharacter but newline
\b              : word boundary
(?:             : start non capture group
  DOB           : literally
 |              : OR
  Date of birth : literally
)               : end group
\b              : word boundary
.*?             : 0 or more anycharacter but newline
(               : start group 1
    \d{1,2}     : 1 or 2 digits
    [/-]        : slash or dash
    \d{1,2}     : 1 or 2 digits
    [/-]        : slash or dash
    (?:         : start non capture group
        \d\d    : 2 digits
    ){1,2}      : end group may appear 1 or twice (ie; 2 OR 4 digits)
)               : end capture group 1



回答2:


import re
string = "DOB/Sex:    12/23/1955            11/15/2014   11:53 AM"
re.findall(r'.*?DOB.*?:\s+([\d/]+)', string)

output:

['12/23/1955']


来源:https://stackoverflow.com/questions/51887141/unable-to-extract-date-of-birth-from-a-given-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!