This should be simple but i cant seem to get it going. The purpose of it is to extract v3 tags from mp3 file names in Mp3tag.
I have these strings I want to extract
A suggestion for a more generic solution, not sure if that is what you need. Valid years will always have the form 19xx or 20xx, and the years will be separated with a word-break character (something other than a number or a letter):
\b(19|20)\d{2}\b
This doesn't really care where in the tag the year appears. A simpler version that doesn't assume anything more than 4 digits in the year would be this expression:
\b\d{4}\b
The key here is the \b escape sequence, which matches any non-word character (word charaters are letters, digits and underscores), including parenthesis, of course.
Would also like to recommend this site: http://www.regular-expressions.info/
You're almost there with your regular expression.
What you really need is:
\s\((\d{4})\)$
Where:
\s
is some whitespace\(
is a literal '('(
is the start of the match group\d
is a digit{4}
means four of the previous atom (i.e. four digits))
is the end of the match group\)
is a literal ')'$
is the end of the stringFor best results, put into a function:
>>> def get_year(name):
... return re.search('\s\((\d{4})\)$', name).groups()[0]
...
>>> for name in "Test String 1 (1994)", "34 Test String 2 (1995)", "Test (String) 3 (1996)":
... print get_year(name)
...
1994
1995
1996
I'd go with
^(.*)\s\(([0-9]{4})\)$
(assuming all years have 4 digits, use [0-9]+
if you have an unknown number of digits, but at least one, or [0-9]*
if there could be no digits)
You need to escape the parentheses. Also you can restrict that a year has only got 4 numbers:
^(.+)\s\(([0-9]{4})\)$
The year is in matchgroup 2.
You can use something like this \((\d{4})\)$
. The first group will have your match.
Explanation
\( # Match the character “(” literally
( # Match the regular expression below and capture its match into backreference number 1
\d # Match a single digit 0..9
{4} # Exactly 4 times
)
\) # Match the character “)” literally
$ # Assert position at the end of a line (at the end of the string or before a line break character)