Let\'s say I have the following data frame:
> myvec
name order_no
1 Amy 12
2 Jack 14
3 Jack 16
4 Dave 11
5 Amy
This should do the trick:
ddply(myvec,~name,summarise,number_of_distinct_orders=length(unique(order_no)))
This requires package plyr.
This would also work but is less eloquent than the plyr solution:
x <- sapply(split(myvec, myvec$name), function(x) length(unique(x[, 2])))
data.frame(names=names(x), number_of_distinct_orders=x, row.names = NULL)
This is a simple solution with the function aggregate
:
aggregate(order_no ~ name, myvec, function(x) length(unique(x)))
In dplyr
you may use n_distinct
to "count the number of unique values":
library(dplyr)
myvec %>%
group_by(name) %>%
summarise(n_distinct(order_no))
Here is a solution with sqldf
library("sqldf")
myvec <- read.table(header=TRUE, text=
" name order_no
1 Amy 12
2 Jack 14
3 Jack 16
4 Dave 11
5 Amy 12
6 Jack 16
7 Tom 19
8 Larry 22
9 Tom 19
10 Dave 11
11 Jack 17
12 Tom 20
13 Amy 23
14 Jack 16")
sqldf("SELECT name,COUNT(distinct(order_no)) as number_of_distinct_orders FROM myvec GROUP BY name")
# > sqldf("SELECT name,COUNT(distinct(order_no)) as number_of_distinct_orders FROM myvec GROUP BY name")
# name number_of_distinct_orders
# 1 Amy 2
# 2 Dave 1
# 3 Jack 3
# 4 Larry 1
# 5 Tom 2
You can just use the built-in R functions tapply
with length
tapply(myvec$order_no, myvec$name, FUN = function(x) length(unique(x)))