Using multiple criteria in subset function and logical operators

后端 未结 2 1177
遥遥无期
遥遥无期 2020-11-28 09:56

If I want to select a subset of data in R, I can use the subset function. I wanted to base an analysis on data that that was matching one of a few criteria, e.g. that a cert

相关标签:
2条回答
  • 2020-11-28 10:28

    For your example, I believe the following should work:

    myNewDataFrame <- subset(bigfive, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)
    

    See the examples in ?subset for more. Just to demonstrate, a more complicated logical subset would be:

    data(airquality)
    dat <- subset(airquality, subset = (Temp > 80 & Month > 5) | Ozone < 40)
    

    And as Chase points out, %in% would be more efficient in your example:

    myNewDataFrame <- subset(bigfive, subset = bf11 %in% c(1, 2, 3))
    

    As Chase also points out, make sure you understand the difference between | and ||. To see help pages for operators, use ?'||', where the operator is quoted.

    0 讨论(0)
  • 2020-11-28 10:48

    The correct operator is %in% here. Here is an example with dummy data:

    set.seed(1)
    dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
                      foo = runif(10))
    

    giving:

    > head(dat)
      bf11       foo
    1    2 0.2059746
    2    2 0.1765568
    3    3 0.6870228
    4    4 0.3841037
    5    1 0.7698414
    6    4 0.4976992
    

    The subset of dat where bf11 equals any of the set 1,2,3 is taken as follows using %in%:

    > subset(dat, subset = bf11 %in% c(1,2,3))
       bf11       foo
    1     2 0.2059746
    2     2 0.1765568
    3     3 0.6870228
    5     1 0.7698414
    8     3 0.9919061
    9     3 0.3800352
    10    1 0.7774452
    

    As to why your original didn't work, break it down to see the problem. Look at what 1||2||3 evaluates to:

    > 1 || 2 || 3
    [1] TRUE
    

    and you'd get the same using | instead. As a result, the subset() call would only return rows where bf11 was TRUE (or something that evaluated to TRUE).

    What you could have written would have been something like:

    subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)
    

    Which gives the same result as my earlier subset() call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in% is far more useful and less verbose in such circumstances. Notice also that I have to use | as I want to compare each element of bf11 against 1, 2, and 3, in turn. Compare:

    > with(dat, bf11 == 1 || bf11 == 2)
    [1] TRUE
    > with(dat, bf11 == 1 | bf11 == 2)
     [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
    
    0 讨论(0)
提交回复
热议问题