Compare similarities between two result sets

前端 未结 2 1225
广开言路
广开言路 2021-01-20 11:49

I am creating a music website where I would like users to be able to find users who like approximately the same artists as they.

I have a \'like\' table that has 2 c

相关标签:
2条回答
  • 2021-01-20 12:04

    It is possible to join a table to itself. (You need to specify an alias for at least one of the two "copies" of the table, so that your query is not ambiguous.)

    So given two users, you can find the "likes" they have in common by doing a join of the like table to itself. You can also find what proportion of User 1's likes are shared by User 2 by doing a left join and counting both how many results there are and how many are null. Note that this is not a symmetric operation, and you will need to tackle the case where one or both of the numbers is 0.

    When you say you want to "find the most similar people in the database": you could do this for every pair of users, but note that if you have n users then this involves doing n*(n-1)/2 comparisons, which is on the order of n squared. This might be quite a lot of work for your database to do if you have a lot of users.

    0 讨论(0)
  • 2021-01-20 12:19

    Something like this:

    SELECT first_user.id_user, second_user.id_user, COUNT(first_user.id_user) AS total_matches
    
    FROM likes AS first_user
    
    JOIN likes AS second_user
    ON second_user.id_artist = first_user.id_artist
    AND second_user.id_user != first_user.id_user
    
    GROUP BY first_user.id_user, second_user.id_user
    
    ORDER BY total_matches DESC
    
    LIMIT 1
    

    Note that this isn't very efficient. One way to work around this is to make a 'cache table' containing the output of this query with the LIMIT 1 portion removed. Add some relevant indexes and do query this cache table. You could set a cron job to update this table periodically.

    Example:

    CREATE TABLE IF NOT EXISTS `likes` (
      `id_user` varchar(50) DEFAULT NULL,
      `id_artist` varchar(50) DEFAULT NULL
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
    
    INSERT INTO `likes` (`id_user`, `id_artist`) VALUES ('8', '39'), ('8', '37'), ('4', '37'), ('8', '24'), ('8', '7'), ('4', '28'), ('8', '28'), ('4', '27'), ('4', '11'), ('8', '49'), ('4', '7'), ('4', '40'), ('4', '29'), ('8', '22'), ('4', '29'), ('8', '11'), ('8', '28'), ('4', '7'), ('4', '31'), ('8', '42'), ('8', '25'), ('4', '25'), ('4', '17'), ('4', '32'), ('4', '46'), ('4', '19'), ('8', '34'), ('3', '32'), ('4', '21')
    
    +---------+---------+---------------+
    | id_user | id_user | total_matches |
    +---------+---------+---------------+
    | 8       | 4       |             7 |
    +---------+---------+---------------+
    
    0 讨论(0)
提交回复
热议问题