Generating test set for recommendation engine

我与影子孤独终老i 提交于 2019-12-10 18:26:39

问题


I am working on a recommendation engine based on implicit feedback. I was using this link : http://insightdatascience.com/blog/explicit_matrix_factorization.html#movielens

This used ALS(Alternating least squares) to compute the user and item vectors. Since, my data set cannot be partitioned by time. I am randomly taking 'x' number of ratings from a user and putting them into the test set. This is a reproducible example of my training user-item matrix.


col1    col2     col3   col4   col5   col6    col7     col8    col9   col10   col1    col12    col13 
+---------------------------------------------------------------------------------------------------+
| 1        0       0     3      10      0       0         3        0      0       1       0        0 |                                                                                   | 
| 0        0       0     5      0       0        1         8        0      0       1       0        0 |                                                                                  |
| 0        0       0     6      7       1        0         2        0      0       1       0        0 |                                                                                   |
+---------------------------------------------------------------------------------------------------+
I then create a test set using this piece of code
    test_ratings = np.random.choice(counts[user,:].nonzero()[0],size=1,replace=True)
        train[user,test_ratings] = 0
        test[user,test_ratings] = counts[user,test_ratings]  
        assert(np.all((train * test) == 0)) 

Which gives me:

col1    col2     col3   col4   col5   col6    col7     col8    col9   col10   col1    col12    col13 
+---------------------------------------------------------------------------------------------------+
| 0        0       0     0      0      0       0         3        0      0       0       0        0 |                                                                                   | 
| 0        0       0     0      0      0       1         0        0      0       0       0        0 |                                                                                  |
| 0        0       0     6      0      0       0         0        0      0       0       0        0 |                                                                                   |
+---------------------------------------------------------------------------------------------------+

Here the rows are users and columns are items.

Now, I was wondering if this is a correct representation of my test set. I have picked up one non zero value and made everything zero. So, my algorithm should be ranking the non zero value as the recommended item.

Is this the correct way of going about things?

Any help would be really appreciated


回答1:


Updated:

Yes you should create a test set with some of your original counts and see if your system identifies those user-items as a good match.

You should be careful with a few things:

  • only put on your testset values for items or users where you have more data;
  • hide those testset values from the training data;
  • train your model only on the user-item pairs where you have data, not on the 0's - The reason for this is because the assumption is that your 0's represent pairs for which you don't have data, and not real ratings;

Note: This papper, Collaborative Filtering for Implicit Feedback Datasets, should help you with these and other questions.



来源:https://stackoverflow.com/questions/36650664/generating-test-set-for-recommendation-engine

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!