How to do str_extract with base R?

后端 未结 3 1286
忘了有多久
忘了有多久 2021-01-02 07:19

I am balancing several versions of R and want to change my R libraries loaded depending on which R and which operating system I\'m using. As such, I want to stick with base

相关标签:
3条回答
  • 2021-01-02 08:02

    1) strcapture If you want to extract a string of digits and dots from "release 1.2.3" using base then

    x <- "release 1.2.3"
    strcapture("([0-9.]+)", x, data.frame(version = character(0)))
    ##   version
    ## 1   1.2.3
    

    2) regexec/regmatches There is also regmatches and regexec but that has already been covered in another answer.

    3) sub Also it is often possible to use sub:

    sub(".* ([0-9.]+).*", "\\1", x)
    ## [1] "1.2.3"
    

    3a) If you know the match is at the beginning or end then delete everything after or before it:

    sub(".* ", "", x)
    ## [1] "1.2.3"
    

    4) gsub Sometimes we know that the field to be extracted has certain characters and they do not appear elsewhere. In that case simply delete every occurrence of every character that cannot be in the string:

    gsub("[^0-9.]", "", x)
    ## [1] "1.2.3"
    

    5) read.table One can often decompose the input into fields and then pick off the desired one by number or via grep. strsplit, read.table or scan can be used:

    read.table(text = x, as.is = TRUE)[[2]]
    ## [1] "1.2.3"
    

    5a) grep/scan

    grep("^[0-9.]+$", scan(textConnection(x), what = "", quiet = TRUE), value = TRUE)
    ## [1] "1.2.3"
    

    5b) grep/strsplit

    grep("^[0-9.]+$", strsplit(x, " ")[[1]], value = TRUE)
    ## [1] "1.2.3"
    

    6) substring If we know the character position of the field we can use substring like this:

    substring(x, 9)
    ## [1] "1.2.3"
    

    6a) substring/regexpr or we may be able to use regexpr to locate the character position for us:

    substring(x, regexpr("\\d", x))
    ## [1] "1.2.3"
    

    7) read.dcf Sometimes it is possible to convert the input to dcf form in which case it can be read with read.dcf. Such data is of the form name: value

     read.dcf(textConnection(sub(" ", ": ", x)))
     ##      release
     ## [1,] "1.2.3"
    
    0 讨论(0)
  • 2021-01-02 08:08
    txt <- c("foo release 123", "bar release", "foo release 123 bar release 123")
    pattern <- "release ([0-9]+)"
    

    Extract first match

    sapply(
        X = txt,
        FUN = function(x){
            tmp = regexpr(pattern, x)
            m = attr(tmp, "match.length")
            st = unlist(tmp)
            if (st == -1){NA}else{substr(x, start = st, stop = st + m - 1)}
        },
        USE.NAMES = FALSE)
    #[1] "release 123" NA            "release 123"
    

    Extract all matches

    sapply(
        X = txt,
        FUN = function(x){
            tmp = gregexpr(pattern, x)
            m = attr(tmp[[1]], "match.length")
            st = unlist(tmp)
            if (st[1] == -1){
                NA
            }else{
                    sapply(seq_along(st), function(i) substr(x, st[i], st[i] + m[i] - 1))
                }
        },
        USE.NAMES = FALSE)
    #[[1]]
    #[1] "release 123"
    
    #[[2]]
    #[1] NA
    
    #[[3]]
    #[1] "release 123" "release 123"
    
    0 讨论(0)
  • 2021-01-02 08:19

    You could do

    txt <- c("foo release 123", "bar release", "foo release 123 bar release 123")
    pattern <- "release ([0-9]+)"
    stringr::str_extract(txt, pattern)
    # [1] "release 123" NA            "release 123"
    sapply(regmatches(txt, regexec(pattern, txt)), "[", 1)
    # [1] "release 123" NA            "release 123"
    
    0 讨论(0)
提交回复
热议问题