Regular expression to extract the year from a string

后端 未结 5 804
盖世英雄少女心
盖世英雄少女心 2020-12-19 22:54

This should be simple but i cant seem to get it going. The purpose of it is to extract v3 tags from mp3 file names in Mp3tag.

I have these strings I want to extract

相关标签:
5条回答
  • 2020-12-19 23:08

    A suggestion for a more generic solution, not sure if that is what you need. Valid years will always have the form 19xx or 20xx, and the years will be separated with a word-break character (something other than a number or a letter):

    \b(19|20)\d{2}\b
    

    This doesn't really care where in the tag the year appears. A simpler version that doesn't assume anything more than 4 digits in the year would be this expression:

    \b\d{4}\b
    

    The key here is the \b escape sequence, which matches any non-word character (word charaters are letters, digits and underscores), including parenthesis, of course.

    Would also like to recommend this site: http://www.regular-expressions.info/

    0 讨论(0)
  • 2020-12-19 23:12

    You're almost there with your regular expression.

    What you really need is:

    \s\((\d{4})\)$
    

    Where:

    • \s is some whitespace
    • \( is a literal '('
    • ( is the start of the match group
    • \d is a digit
    • {4} means four of the previous atom (i.e. four digits)
    • ) is the end of the match group
    • \) is a literal ')'
    • $ is the end of the string

    For best results, put into a function:

    >>> def get_year(name):
    ...     return re.search('\s\((\d{4})\)$', name).groups()[0]
    ... 
    >>> for name in "Test String 1 (1994)", "34 Test String 2 (1995)", "Test (String) 3 (1996)":
    ...     print get_year(name)
    ... 
    1994
    1995
    1996
    
    0 讨论(0)
  • 2020-12-19 23:16

    I'd go with

    ^(.*)\s\(([0-9]{4})\)$
    

    (assuming all years have 4 digits, use [0-9]+ if you have an unknown number of digits, but at least one, or [0-9]* if there could be no digits)

    0 讨论(0)
  • 2020-12-19 23:23

    You need to escape the parentheses. Also you can restrict that a year has only got 4 numbers:

    ^(.+)\s\(([0-9]{4})\)$
    

    The year is in matchgroup 2.

    0 讨论(0)
  • 2020-12-19 23:25

    You can use something like this \((\d{4})\)$. The first group will have your match.

    Explanation

    \(       # Match the character “(” literally
    (        # Match the regular expression below and capture its match into backreference number 1
       \d       # Match a single digit 0..9
          {4}      # Exactly 4 times
    )
    \)       # Match the character “)” literally
    $        # Assert position at the end of a line (at the end of the string or before a line break character)
    
    0 讨论(0)
提交回复
热议问题