Conditional count and group by in R

问题

I would like to count how many rows there are per type if they meet the condition x == 0. Sort of like a group by in SQL

Here is an example of the data

  type    x    
search    0 
NULL      0 
public    0
search    1
home      0
home      1
search    0

回答1:

I am assuming that you want to find the number of rows when a particular condition (when a variable is having some value) is met.

If this is the case, then I suppose you have "x" as a variable represented in a column. "x" can take multiple values. Suppose you want to find how many rows are there in your data when x is 0. This could be done by:

nrow(subset(data, x=="0")

'data' is the object name for your dataset in R

EDIT:

I am seeing your edited dataframe now. You could use this to solve your problem:

table(data$type, data$x)

回答2:

You could also use the sqldf package:

library(sqldf)
df <- data.frame(type=c('search','NULL','public','search','home','home','search'),x=c(0,0,0,1,0,1,0))
sqldf("SELECT type, COUNT(*) FROM df WHERE x=0 GROUP BY type")

which gives the following result:

    type COUNT(*)
1   NULL        1
2   home        1
3 public        1
4 search        2

回答3:

Given the data frame, df=data.frame(type=c('search','NULL','public','search','home','home','search'),x=c(0,0,0,1,0,1,0))

If you want to know how many of each value in column 1 have a value in column 2 of zero then you can use: table(df)[,1]

as long as you are only working with 1's and 0's to get the answer:

  home   NULL public search 
     1      1      1      2

回答4:

You could also do this with the dplyr package:

library(dplyr)

df2 <- df %>% group_by(x,type) %>% tally()

which gives:

  x   type n
1 0   home 1
2 0   NULL 1
3 0 public 1
4 0 search 2
5 1   home 1
6 1 search 1

回答5:

Given your data is structured as a data frame, the following code has a better running time than the answers given above:

nrow(data[data$x=="0"])

You can test your run time using:

ptm <- proc.time()
nrow(subset(data, x == "0"))
proc.time() - ptm

ptm <- proc.time()
nrow(data[data$x=="0"]))
proc.time() - ptm

In my case, the running time was about 15 times faster, with 1 million rows.

来源：https://stackoverflow.com/questions/26042409/conditional-count-and-group-by-in-r

标签

count

conditional

aggregation