Python/Regex - How to extract date from filename using regular expression?

前端 未结 5 1042
滥情空心
滥情空心 2020-11-27 08:05

I need to use python to extract the date from filenames. The date is in the following format:

month-day-year.somefileextension

Examples:

相关标签:
5条回答
  • 2020-11-27 08:47

    I think you can extract the date using re.split as follows

    $ ipython
    
    In [1]: import re
    
    In [2]: input_file = '10-12-2011.zip'
    
    In [3]: file_split = re.split('(\d{2}-\d{2}-\d{4})', input_file, 1)
    
    In [4]: file_split
    Out[4]: ['', '10-12-2011', '.zip']
    
    In [5]: file_split[1]
    Out[5]: '10-12-2011'
    
    In [6]: input_file = 'somedatabase-10-04-2011.sql.tar.gz'
    
    In [7]: file_split = re.split('(\d{2}-\d{2}-\d{4})', input_file, 1)
    
    In [8]: file_split
    Out[8]: ['somedatabase-', '10-04-2011', '.sql.tar.gz']
    
    In [9]: file_split[1]
    Out[9]: '10-04-2011'
    

    I ran the tests with Python 3.6.6, IPython 5.3.0

    0 讨论(0)
  • 2020-11-27 08:55
    **This is simple method to find date from text file in python**
    import os
    import re
    file='rain.txt' #name of the file
    if(os.path.isfile(file)): #cheak if file exists or not
        with open(file,'r') as i:
            for j in i: #we will travarse line by line in file 
                try:
                    match=re.search(r'\d{2}-\d{2}-\d{4}',j) #regular expression for date
                    print(match.group()) #print date if match is found
                except AttributeError: 
                    pass
    else:
        print("file does not exist")
    
    0 讨论(0)
  • 2020-11-27 08:57

    You want to use a capture group.

    m = re.search('\b(\d{2}-\d{2}-\d{4})\.', 'derer-10-12-2001.zip')
    print m.group(1)
    

    Should print 10-12-2001.

    You could get away with a more terse regex, but ensuring that it is preceded by a - and followed by a . provides some minimal protection against double-matches with funky filenames, or malformed filenames that shouldn't match at all.

    EDIT: I replaced the initial - with a \b, which matches any border between an alphanumeric and a non-alphanumeric. That way it will match whether there is a hyphen or the beginning of the string preceding the date.

    0 讨论(0)
  • 2020-11-27 08:57

    well the \w+ you put in matches one or more word characters following a hypen, so that's the expected result. What you want to do is use a lookaround on either side, matching numbers and hyphens that occur between the first hyphen and a period:

    re.search(r'(?<=-)[\d-]+(?=\.)', name).group(0)

    0 讨论(0)
  • 2020-11-27 08:59

    Assuming the date is always in the format: [MM]-[DD]-[YYYY].

    re.search("([0-9]{2}\-[0-9]{2}\-[0-9]{4})", fileName)
    
    0 讨论(0)
提交回复
热议问题