I\'m looking to extract the year from a string. This always comes after an \'X\' and before \".\" then a string of other characters.
Using stringr
\'s
The capture group is irrelevant in this case. The function str_extract
will return the whole match including characters before and after the capture group.
You have to work with lookbehind and lookahead instead. Their length is zero.
library(stringr)
str_extract(string = 'X2015.XML.Outgoing.pounds..millions.',
pattern = '(?<=X)\\d{4}(?=\\.)')
# [1] "2015"
This regex matches four consecutive digits that are preceded by an X
and followed by a .
.