R grep: is there an AND operator?

前端 未结 4 1105
有刺的猬
有刺的猬 2020-11-30 06:00

Suppose I have the following data frame:

User.Id    Tags
34234      imageUploaded,people.jpg,more,comma,separated,stuff
34234      imageUploaded
12345      p         


        
相关标签:
4条回答
  • 2020-11-30 06:37

    I love @Chase's answer, and it makes good sense to me, but it can be a bit dangerous to use constructs that one doesn't totally understand.

    This answer is meant to reassure anyone who'd like to use @thelatemail's more straightforward approach that it works just as well and is completely competitive speedwise. It's certainly what I'd use in this case. (It's also reassuring that the more sophisticated Perl-compatible-regex pays no performance cost for its power and easy extensibility.)

    library(rbenchmark)
    x <- paste0(sample(letters, 1e6, replace=T), ## A longer vector of
                sample(letters, 1e6, replace=T)) ## possible matches
    
    ## Both methods give identical results
    tlm <- grepl("a", x, fixed=TRUE) & grepl("b", x, fixed=TRUE)
    pat <- "(?=.*a)(?=.*b)"
    Chase <- grepl(pat, x, perl=TRUE)
    identical(tlm, Chase)
    # [1] TRUE    
    
    ## Both methods are similarly fast
    benchmark(
        tlm = grepl("a", x, fixed=TRUE) & grepl("b", x, fixed=TRUE),
        Chase = grepl(pat, x, perl=TRUE))
    #          test replications elapsed relative user.self sys.self
    # 2       Chase          100    9.89    1.105      9.80     0.10
    # 1 thelatemail          100    8.95    1.000      8.47     0.48
    
    0 讨论(0)
  • 2020-11-30 06:45

    Below is an alternative to grep using hadley's stringr::str_detect(). This avoids the use of perl=true @jan-stanstrup. Additionally, the dplyr::filter() will return the rows within the dataframe itself so you never need to leave the df.

    library(stringr)
    libary(dplyr)
     x <- data.frame(User.Id =c(34234,34234,12345), 
                     Tags=c("imageUploaded,people.jpg,more,comma,separated,stuff",
                            "imageUploaded",
                            "people.jpg"))
    
     data.people <- x %>% filter(str_detect(Tags,"(?=.*imageUploaded)(?=.*people\\.jpg)"))
     data.people
    
    # returns
    #  User.Id                                                Tags
    # 1   34234 imageUploaded,people.jpg,more,comma,separated,stuff
    

    This is simpler and works if "people.jpg" always follows "imageUploaded"

    str_extract(x,"imageUploaded.*people\\.jpg")
    
    0 讨论(0)
  • 2020-11-30 06:48

    For readability's sake, you could just do:

    x <- c(
           "imageUploaded,people.jpg,more,comma,separated,stuff",
           "imageUploaded",
           "people.jpg"
           )
    
    xmatches <- intersect(
                          grep("imageUploaded",x,fixed=TRUE),
                          grep("people.jpg",x,fixed=TRUE)
                         )
    x[xmatches]
    [1] "imageUploaded,people.jpg,more,comma,separated,stuff"
    
    0 讨论(0)
  • 2020-11-30 06:49

    Thanks to this answer, this regex seems to work. You want to use grepl() which returns a logical to index into your data object. I won't claim to fully understand the inner workings of the regex, but regardless:

    x <- c("imageUploaded,people.jpg,more,comma,separated,stuff", "imageUploaded", "people.jpg")
    
    grepl("(?=.*imageUploaded)(?=.*people\\.jpg)", x, perl = TRUE)
    #-----
    [1]  TRUE FALSE FALSE
    
    0 讨论(0)
提交回复
热议问题