stringr str_extract capture group capturing everything

后端 未结 3 984
无人共我
无人共我 2021-02-20 09:09

I\'m looking to extract the year from a string. This always comes after an \'X\' and before \".\" then a string of other characters.

Using stringr\'s

相关标签:
3条回答
  • 2021-02-20 09:41

    The capture group is irrelevant in this case. The function str_extract will return the whole match including characters before and after the capture group.

    You have to work with lookbehind and lookahead instead. Their length is zero.

    library(stringr)
    str_extract(string = 'X2015.XML.Outgoing.pounds..millions.',
                pattern = '(?<=X)\\d{4}(?=\\.)')
    # [1] "2015"
    

    This regex matches four consecutive digits that are preceded by an X and followed by a ..

    0 讨论(0)
  • 2021-02-20 09:51

    Alternatively, you can use gsub:

    string = 'X2015.XML.Outgoing.pounds..millions.'
    
    gsub("X(\\d{4})\\..*", "\\1", string)
    # [1] "2015"
    

    or str_replace from stringr:

    library(stringr)
    str_replace(string, "X(\\d{4})\\..*", "\\1")
    # [1] "2015"
    
    0 讨论(0)
  • 2021-02-20 09:54

    I believe the most idiomatic way is to use str_match:

    str_match(string = 'X2015.XML.Outgoing.pounds..millions.',
              pattern = 'X(\\d{4})\\.')
    

    Which returns the complete match followed by capture groups:

         [,1]     [,2]  
    [1,] "X2015." "2015"
    

    As such the following will do the trick:

    str_match(string = 'X2015.XML.Outgoing.pounds..millions.',
              pattern = 'X(\\d{4})\\.')[2]
    
    0 讨论(0)
提交回复
热议问题