I\'ve got a string, which looks like \"Blah blah blah, Updated: Aug. 23, 2012\", from which I want to use Regex to extract just the date Aug. 23, 2012
. I found
With a regex, you may use two regexps depending on the occurrence of the word:
# Remove all up to the first occurrence of the word including it (non-greedy):
^.*?word
# Remove all up to the last occurrence of the word including it (greedy):
^.*word
See the non-greedy regex demo and a greedy regex demo.
The ^
matches the start of string position, .*?
matches any 0+ chars (mind the use of re.DOTALL
flag so that .
could match newlines) as few as possible (.*
matches as many as possible) and then word
matches and consumes (i.e. adds to the match and advances the regex index) the word.
Note the use of re.escape(up_to_word)
: if your up_to_word
does not consist of sole alphanumeric and underscore chars, it is safer to use re.escape
so that special chars like (
, [
, ?
, etc. could not prevent the regex from finding a valid match.
See the Python demo:
import re
date_div = "Blah blah\nblah, Updated: Aug. 23, 2012 Blah blah Updated: Feb. 13, 2019"
up_to_word = "Updated:"
rx_to_first = r'^.*?{}'.format(re.escape(up_to_word))
rx_to_last = r'^.*{}'.format(re.escape(up_to_word))
print("Remove all up to the first occurrence of the word including it:")
print(re.sub(rx_to_first, '', date_div, flags=re.DOTALL).strip())
print("Remove all up to the last occurrence of the word including it:")
print(re.sub(rx_to_last, '', date_div, flags=re.DOTALL).strip())
Output:
Remove all up to the first occurrence of the word including it:
Aug. 23, 2012 Blah blah Updated: Feb. 13, 2019
Remove all up to the last occurrence of the word including it:
Feb. 13, 2019