Subsetting in R using OR condition with strings

后端 未结 2 1229
忘掉有多难
忘掉有多难 2021-02-04 14:04

I have a data frame with about 40 columns, the second column, data[2] contains the name of the company that the rest of the row data describes. However, the names of the compani

相关标签:
2条回答
  • 2021-02-04 14:11

    First of all (as Jonathan done in his comment) to reference second column you should use either data[[2]] or data[,2]. But if you are using subset you could use column name: subset(data, CompanyName == ...).

    And for you question I will do one of:

    subset(data, data[[2]] %in% c("Company Name 09", "Company Name"), drop = TRUE) 
    subset(data, grepl("^Company Name", data[[2]]), drop = TRUE)
    

    In second I use grepl (introduced with R version 2.9) which return logical vector with TRUE for match.

    0 讨论(0)
  • 2021-02-04 14:16

    A couple of things:

    1) Mock-up data is useful as we don't know exactly what you're faced with. Please supply data if possible. Maybe I misunderstood in what follows?

    2) Don't use [[2]] to index your data.frame, I think [,"colname"] is much clearer

    3) If the only difference is a trailing ' 09' in the name, then simply regexp that out:

    R> x1 <- c("foo 09", "bar", "bar 09", "foo")
    R> x2 <- gsub(" 09$", "", x1)
    [1] "foo" "bar" "bar" "foo"
    R> 
    

    Now you should be able to do your subset on the on-the-fly transformed data:

    R> data <- data.frame(value=1:4, name=x1)
    R> subset(data, gsub(" 09$", "", name)=="foo")
      value   name
    1     1 foo 09
    4     4    foo
    R> 
    

    You could also have replace the name column with regexp'ed value.

    0 讨论(0)
提交回复
热议问题