What's difference between item-based and content-based collaborative filtering?

后端 未结 2 368
北海茫月
北海茫月 2021-01-29 18:52

I am puzzled about what the item-based recommendation is, as described in the book \"Mahout in Action\". There is the algorithm in the book:

for every item i tha         


        
相关标签:
2条回答
  • 2021-01-29 19:37

    Item-Based Collaborative Filtering

    The original Item-based recommendation is totally based on user-item ranking (e.g., a user rated a movie with 3 stars, or a user "likes" a video). When you compute the similarity between items, you are not supposed to know anything other than all users' history of ratings. So the similarity between items is computed based on the ratings instead of the meta data of item content.

    Let me give you an example. Suppose you have only access to some rating data like below:

    user 1 likes: movie, cooking
    user 2 likes: movie, biking, hiking
    user 3 likes: biking, cooking
    user 4 likes: hiking
    

    Suppose now you want to make recommendations for user 4.

    First you create an inverted index for items, you will get:

    movie:     user 1, user 2
    cooking:   user 1, user 3
    biking:    user 2, user 3
    hiking:    user 2, user 4
    

    Since this is a binary rating (like or not), we can use a similarity measure like Jaccard Similarity to compute item similarity.

                                     |user1|
    similarity(movie, cooking) = --------------- = 1/3
                                   |user1,2,3|
    

    In the numerator, user1 is the only element that movie and cooking both has. In the denominator the union of movie and cooking has 3 distinct users (user1,2,3). |.| here denote the size of the set. So we know the similarity between movie and cooking is 1/3 in our case. You just do the same thing for all possible item pairs (i,j).

    After you are done with the similarity computation for all pairs, say, you need to make a recommendation for user 4.

    • Look at the similarity score of similarity(hiking, x) where x is any other tags you might have.

    If you need to make a recommendation for user 3, you can aggregate the similarity score from each items in its list. For example,

    score(movie)  = Similarity(biking, movie) + Similarity(cooking, movie)
    score(hiking) = Similarity(biking, hiking) + Similarity(cooking, hiking) 
    

    Content-Based Recommendation

    The point of content-based is that we have to know the content of both user and item. Usually you construct user-profile and item-profile using the content of shared attribute space. For example, for a movie, you represent it with the movie stars in it and the genres (using a binary coding for example). For user profile, you can do the same thing based on the users likes some movie stars/genres etc. Then the similarity of user and item can be computed using e.g., cosine similarity.

    Here is a concrete example:

    Suppose this is our user-profile (using binary encoding, 0 means not-like, 1 means like), which contains user's preference over 5 movie stars and 5 movie genres:

             Movie stars 0 - 4    Movie Genres
    user 1:    0 0 0 1 1          1 1 1 0 0
    user 2:    1 1 0 0 0          0 0 0 1 1
    user 3:    0 0 0 1 1          1 1 1 1 0
    

    Suppose this is our movie-profile:

             Movie stars 0 - 4    Movie Genres
    movie1:    0 0 0 0 1          1 1 0 0 0
    movie2:    1 1 1 0 0          0 0 1 0 1
    movie3:    0 0 1 0 1          1 0 1 0 1
    

    To calculate how good a movie is to a user, we use cosine similarity:

                                     dot-product(user1, movie1)
    similarity(user 1, movie1) = --------------------------------- 
                                       ||user1|| x ||movie1||
    
                                  0x0+0x0+0x0+1x0+1x1+1x1+1x1+1x0+0x0+0x0
                               = -----------------------------------------
                                             sqrt(5) x sqrt(3)
    
                               = 3 / (sqrt(5) x sqrt(3)) = 0.77460
    

    Similarly:

    similarity(user 2, movie2) = 3 / (sqrt(4) x sqrt(5)) = 0.67082 
    similarity(user 3, movie3) = 3 / (sqrt(6) x sqrt(5)) = 0.54772
    

    If you want to give one recommendation for user i, just pick movie j that has the highest similarity(i, j).

    Hope this helps.

    0 讨论(0)
  • 2021-01-29 19:38

    "Item-based" really means "item-similarity-based". You can put whatever similarity metric you like in here. Yes, if it's based on content, like a cosine similarity over term vectors, you could also call this "content-based".

    0 讨论(0)
提交回复
热议问题