finding if boolean is ever true by groups in R

删除回忆录丶 提交于 2019-12-02 13:10:11

问题


I want a simple way to create a new variable determining whether a boolean is ever true in R data frame. Here is and example: Suppose in the dataset I have 2 variables (among other variables which are not relevant) 'a' and 'b' and 'a' determines a group, while 'b' is a boolean with values TRUE (1) or FALSE (0). I want to create a variable 'c', which is also a boolean being 1 for all entries in groups where 'b' is at least once 'TRUE', and 0 for all entries in groups in which 'b' is never TRUE. From entries like below:

a   b
-----
1   1 
2   0
1   0
1   0
1   1
2   0
2   0
3   0
3   1
3   0

I want to get variable 'c' like below:

a   b   c
-----------
1   1   1 
2   0   0
1   0   1
1   0   1
1   1   1
2   0   0
2   0   0
3   0   1
3   1   1
3   0   1
-----------

I know how to do it in Stata, but I haven't done similar things in R yet, and it is difficult to find information on that on the internet. In fact I am doing that only in order to later remove all the observations for which 'c' is 0, so any other suggestions would be fine as well. The application of that relates to multinomial logit estimation, where the alternatives that are never-chosen need to be removed from the dataset before estimation.


回答1:


A base R option would be

 df1$c <- with(df1, ave(b, a, FUN=any))

Or

 library(sqldf)
 sqldf('select * from df1
      left join(select a, b,
         (sum(b))>0 as c
         from df1 
         group by a)
         using(a)')



回答2:


Simple data.table approach

require(data.table)
data <- data.table(data)
data[, c := any(b), by = a]

Even though logical and numeric (0-1) columns behave identically for all intents and purposes, if you'd like a numeric result you can simply wrap the call to any with as.numeric.




回答3:


if X is your data frame

library(dplyr)
X <- X %>%
  group_by(a) %>%
  mutate(c = any(b == 1))



回答4:


An answer with base R, assuming a and b are in dataframe x

c value is a 1-to-1 mapping with a, and I create a mapping here

cmap <- ifelse(sapply(split(x, x$a), function(x) sum(x[, "b"])) > 0, 1, 0)

Then just add in the mapped value into the data frame

x$c <- cmap[x$a]

Final output

> x
   a b c
1  1 1 1
2  2 0 0
3  1 0 1
4  1 0 1
5  1 1 1
6  2 0 0
7  2 0 0
8  3 0 1
9  3 1 1
10 3 0 1

edited to change call to split.



来源:https://stackoverflow.com/questions/31576879/finding-if-boolean-is-ever-true-by-groups-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!