I need to calculate the majority vote for an item in R and I don\'t have a clue how to approach this.
I have a data frame with items and assigned categories. What I
tdat <- tapply(dat$category, dat$item, function(vec) sort(table(vec),
decreasing=TRUE)[1] )
data.frame(item=rownames(tdat), plurality_vote=tdat)
item plurality_vote
1 1 3
2 2 2
A more complex function would be needed to distinguish a plurality (possibly with ties) from a true majority.
If you have a function to calculate the mode, as in package prettyR
, you can use aggregate
:
require(prettyR)
aggregate(d$category, by=list(item=d$item), FUN=Mode)
# item x
#1 1 2
#2 2 1
One liner (using plyr
):
ddply(dt, .(item), function(x) which.max(tabulate(x$category)))
You could use two things here. First, this is how you get the most frequent item in a vector:
> v = c(1,1,1,2,2)
> names(which.max(table(v)))
[1] "1"
This is a character value, but we can easily to an as.numeric on it if necessary.
Once we know how to do that, we can use the grouping functionality of the data.table package to perform a per-item evaluation of what its most frequent category is. Here is the code for your example above:
> dt = data.table(item=c(1,1,1,1,2,2,2,2), category=c(2,3,2,2,2,3,1,1))
> dt
item category
1: 1 2
2: 1 3
3: 1 2
4: 1 2
5: 2 2
6: 2 3
7: 2 1
8: 2 1
> dt[,as.numeric(names(which.max(table(category)))),by=item]
item V1
1: 1 2
2: 2 1
The new V1 column contains the numeric version of the most frequent category for each item. If you want to give it a proper name, the syntax is a little uglier:
> dt[,list(mostFreqCat=as.numeric(names(which.max(table(category))))),by=item]
item mostFreqCat
1: 1 2
2: 2 1