Conditional count and group by in R

天大地大妈咪最大 提交于 2019-12-22 18:09:05

问题


I would like to count how many rows there are per type if they meet the condition x == 0. Sort of like a group by in SQL

Here is an example of the data

  type    x    
search    0 
NULL      0 
public    0
search    1
home      0
home      1
search    0

回答1:


I am assuming that you want to find the number of rows when a particular condition (when a variable is having some value) is met.

If this is the case, then I suppose you have "x" as a variable represented in a column. "x" can take multiple values. Suppose you want to find how many rows are there in your data when x is 0. This could be done by:

nrow(subset(data, x=="0")

'data' is the object name for your dataset in R

EDIT:

I am seeing your edited dataframe now. You could use this to solve your problem:

table(data$type, data$x)



回答2:


You could also use the sqldf package:

library(sqldf)
df <- data.frame(type=c('search','NULL','public','search','home','home','search'),x=c(0,0,0,1,0,1,0))
sqldf("SELECT type, COUNT(*) FROM df WHERE x=0 GROUP BY type")

which gives the following result:

    type COUNT(*)
1   NULL        1
2   home        1
3 public        1
4 search        2



回答3:


Given the data frame, df=data.frame(type=c('search','NULL','public','search','home','home','search'),x=c(0,0,0,1,0,1,0))

If you want to know how many of each value in column 1 have a value in column 2 of zero then you can use: table(df)[,1]

as long as you are only working with 1's and 0's to get the answer:

  home   NULL public search 
     1      1      1      2



回答4:


You could also do this with the dplyr package:

library(dplyr)

df2 <- df %>% group_by(x,type) %>% tally()

which gives:

  x   type n
1 0   home 1
2 0   NULL 1
3 0 public 1
4 0 search 2
5 1   home 1
6 1 search 1



回答5:


Given your data is structured as a data frame, the following code has a better running time than the answers given above:

nrow(data[data$x=="0"])

You can test your run time using:

ptm <- proc.time()
nrow(subset(data, x == "0"))
proc.time() - ptm

ptm <- proc.time()
nrow(data[data$x=="0"]))
proc.time() - ptm

In my case, the running time was about 15 times faster, with 1 million rows.



来源:https://stackoverflow.com/questions/26042409/conditional-count-and-group-by-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!