I\'ve always been curious as to how these systems work. For example, how do netflix or Amazon determine what recommendations to make based on past purchases and/or ratings?
At it's most basic, most recommendation systems work by saying one of two things.
User-based recommendations:
If User A likes Items 1,2,3,4, and 5,
And User B likes Items 1,2,3, and 4
Then User B is quite likely to also like Item 5
Item-based recommendations:
If Users who purchase item 1 are also disproportionately likely to purchase item 2
And User A purchased item 1
Then User A will probably be interested in item 2
And here's a brain dump of algorithms you ought to know:
- Set similarity (Jaccard index & Tanimoto coefficient)
- n-Dimensional Euclidean distance
- k-means algorithm
- Support Vector Machines
There're mainly two types of recommender systems, which work differently:
1. Content-based. These systems make recommendations based on characteristic information. This is information about the items (keywords, categories, etc.) and users (preferences, profiles, etc.).
2. Collaborative filtering. These systems are based on user-item interactions. This is information such as ratings, number of purchases, likes, etc.
This article (published by the company I work at) provides an overview of the two systems, some practical examples, and suggests when it makes sense to implement them.
The O'Reilly book "Programming Collective Intelligence" has a nice chapter showing how it works. Very readable.
The code examples are all written in Python, but that's not a big problem.
This is such a commercially important application that Netflix introduced a $1 million prize for improving their recommendations by 10%.
After a couple of years people are getting close (I think they're up around 9% now) but it's hard for many, many reasons. Probably the biggest factor or the biggest initial improvement in the Netflix Prize was the use of a statistical technique called singular value decomposition.
I highly recommend you read If You Liked This, You’re Sure to Love That for an in-depth discussion of the Netflix Prize in particular and recommendation systems in general.
Basically though the principle of Amazon and so on is the same: they look for patterns. If someone bought the Star Wars Trilogy well there's a better than even chance they like Buffy the Vampire Slayer more than the average customer (purely made up example).
GroupLens Research at the University of Minnesota studies recommender systems and generously shares their research and datasets.
Their research expands a bit each year and now considers specifics like online communities, social collaborative filtering, and the UI challenges in presenting complex data.
The Netflix algorithm for its recommendation system is actually a competitive endeavor in which programmers continue to compete to make gains in the accuracy of the system.
But in the most basic terms, a recommendation system would examine the choices of users who closely match another user's demographic/interest information.
So if you are a white male, 25 years old, from New York City, the recommendation system might try and bring you products purchased by other white males in the northeast United States in the age range of 21-30.
Edit: It should also be noted that the more information you have about your users, the more closely you can refine your algorithms to match what other people are doing to what may interest the user in question.