Wilson's Confidence Interval takes as arguments the values TRUE or FALSE, or "upvotes" and "downvotes" respectively. From these votes it generates a rating.
For the purpose of my project, I think WCI is perfect. However, the scalar upvote and downvote is not enough to describe the thing I am rating.
That's where 5 star rating comes in, and this is where I need someone to disprove my logic. Now I'm thinking, if I were to implement a 5 star rating with WCI then the following should work without hacking the internals of the confidence interval.
For each star in the rating widget we assign a unique integer value. Each value either counts as a positive (upvote) or negative (downvote). So the following values would be:
1/5 stars: -2 2/5 stars: -1 3/5 stars: 1 4/5 stars: 2 5/5 stars: 3
To summarise the above values. The minimum vote of 1 star is classed as 2 downvotes. A vote of 2 stars is classed as 1 down vote. For the medium vote of 3 stars we give 1 upvote. For 4 stars we give 2 upvotes. And for the maximum of 5 stars we give 3 upvotes.
Please, disprove this logic, why won't this work? Maybe it goes against the "average person's understanding" of a star rating system?
It's easy to think of the following 'workaround' which converts a multi-ranking system to the binary 'upvote/downvote'-style ranking (that can then be scored using the lower bound of Wilson score confidence interval):
Let's say you have the popular 5 star rating system. So we have a number of votes, each having a value of: 1, 2, 3, 4 or 5.
To 'convert' these ratings to up/down votes, use the following rule:
For star rating -- Add
* - 0.00 to up votes and 1.00 to down votes (i.e. a full down vote)
** - 0.25 to up votes and 0.75 to down votes
*** - 0.50 to up votes and 0.50 to down votes
**** - 0.75 to up votes and 0.25 to down votes
***** - 1.00 to up votes and 0.00 to down votes (i.e. a full up vote)
After we reduce the 5 star ratings to up/down ratings, we can proceed with the usual score calculations described in Evan Miller's article.
As I am not a statistician or mathematician and I would love to hear from other people if this makes sense or not and what might be the issues with this approach.
First, try to understand what is the intuition behind WCI. Or, even simpler, Normal approximation interval ( http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval ).
The intuition behind all this interval calculation is simple. You calculate a sample mean and the standard deviation. Interval is mean+-z*std.
In your case calculating mean is simple. It is the mean of ratings itself. Assume p1 is the fraction of 1-star rating, p2,..., p5. p1+p2+...+p5 = 1. And assume you are calculating these stats using n samples. mean of your data is 1*p1+2*p2+...+5*p5.
The variance of your data is ( E(x^2)-(E(x))^2 )/n = ( (p1*1^2 + p2*2^2..+p5*5^2) - (1*p1+2*p2+..+5*p5)^2 )/n
Since std = sqrt(var), it is pretty straightforward to calculate Normal approximation interval. I will let you work on extending this to WCI.
The biggest problem with this scheme is that a single 5-star rating will weigh as much as 3 2-star ratings. And also, an item with 300 3-star ratings (which should be a mediocre score) will have the same score as an item with 100 5-star ratings (which should be a perfect score).
What you could do is calculate a Wilson confidence interval for each possible score. The lower bound of each interval is then the weight of that score towards the (weighted) average.