问题
I've had a bit of a search, but didn't find anything quite like what I'm trying to achieve.
Basically, I'm trying to find a similarity between two users' voting habits.
I have a table storing each individual vote made, which stores:
voteID
itemID (the item the vote is attached to)
userID (the user who voted)
direction (whether the user voted the post up, or down)
I'm aiming to calculate the similarity between, say, users A and B, by finding out two things:
- The number of votes they have in common. That is, the number of times they've both voted on the same post (the direction does not matter at this point).
- The number of times they've voted in the same direction, on common votes.
(Then simply to calculate #2 as a percentage of #1, to achieve a crude similarity rating).
My question is, how do I find the intersection between the two users' sets of votes? (i.e. how do I calculate point #1 adequately, without looping over every vote in a highly inefficient way.) If they were in different tables, an INNER JOIN would suffice, I'd imagine... but that obviously won't work on the same table (or will it?).
Any ideas would be greatly appreciated.
回答1:
Something like this:
SELECT COUNT(*)
FROM votes v1
INNER JOIN votes v2 ON (v1.item_id = v2.item_id)
WHERE v1.userID = 'userA'
AND v2.userUD = 'userB'
回答2:
Here's an example that should get you closer:
SELECT COUNT(*)
FROM (
SELECT u1.userID
FROM vote u1, vote u2
WHERE u1.itemID = u2.itemID
AND u1.userID = user1
AND u2.userID = user2)
回答3:
Assuming userID 1 being compared to userID 2
For finding how many votes they have in common:
SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
AND v2.itemID = v1.itemID)
WHERE v1.userID = 1;
For finding when they also voted the same:
SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
AND v2.itemID = v1.itemID
AND v2.direction = v1.direction)
WHERE v1.userID = 1;
回答4:
A self join is in order. Here it is with all you asked:
SELECT v1.userID user1, v2.userID user2,
count(*) n_votes_in_common,
sum(case when v1.direction = v2.direction then 1 else 0 end) n_votes_same_direction,
(n_votes_same_direction * 100.0 / n_votes_in_common) crude_similarity_percent
FROM votes v1
INNER JOIN votes v2
ON v1.item_id = v2.item_id
回答5:
In case you want to do this for a single user (rather than knowing both users at the start) to find to whom they are the closest match:
SELECT
v2.userID,
COUNT(*) AS matching_items,
SUM(CASE WHEN v2.direction = v1.direction THEN 1 ELSE 0 END) AS matching_votes
FROM
Votes v1
INNER JOIN Votes v2 ON
v2.userID <> v1.userID AND
v2.itemID = v1.itemID
WHERE
v1.userID = @userID
GROUP BY
v2.userID
You can then limit that however you see fit (return the top 10, top 20, all, etc.)
I haven't tested this yet, so let me know if it doesn't act as expected.
回答6:
You most certainly can join a table to itself. In fact, that's what you're going to have to do. You must use aliasing when joining a table to itself. If your table doesn't have a PK or FK, you'll have to use Union instead. Union will remove duplicates and Union All will not.
来源:https://stackoverflow.com/questions/513098/get-common-rows-within-the-same-table