Popularity Algorithm

一世执手 提交于 2019-11-28 18:55:41

Typically Digg and Reddit-like sites go by the date of the submission and not the times of the votes. This way all it takes is a simple SQL query to find the top submissions for X time period. Here's a pseudo-query to find the 10 most popular links from the past 24 hours using this method:

select * from submissions
 where (current_time - post_time) < 86400
 order by score desc limit 10

Basically, this query says to find all the submissions where the number of seconds between now and the time it was posted is less than 86400, which is 24 hours in UNIX time.

If you really want to measure popularity within X time interval, you'll need to store the post and time for every vote in another table:

create table votes (
 post foreign key references submissions(id),
 time datetime,
 vote integer); -- +1 for upvote, -1 for downvote

Then you can generate a list of the most popular posts between X and Y times like so:

select sum(vote), post from votes
 where X < time and time < Y
 group by post
 order by sum(vote) desc limit 10;

From here you're just a hop, skip, and inner join away from getting the post data tied to the returned ids.

Do you have a decent DB setup? Can we please hear about your CREATE TABLE details and indices? Assuming a sane setup, the DB should be able to pull the counts you require fast enough to suit your needs! For example (net of indices and keys, that somewhat depend on what DB engine you're using), given two tables:

CREATE TABLE submissions (subid INT, when DATETIME, etc etc)
CREATE TABLE likes (subid INT, when DATETIME, etc etc)

you can get the top 33 all-time popular submissions as

SELECT *, COUNT(likes.subid) AS score
FROM submissions
JOIN likes USING(subid)
GROUP BY submissions.subid
ORDER BY COUNT(likes.subid) DESC
LIMIT 33

and those voted for within a certain time range as

SELECT *, COUNT(likes.subid) AS score
FROM submissions
JOIN likes USING(subid)
WHERE likes.when BETWEEN initial_time AND final_time
GROUP BY submissions.subid
ORDER BY COUNT(likes.subid) DESC
LIMIT 33

If you were storing "votes" (positive or negative) in likes, instead of just counting each entry there as +1, you could simply use SUM(likes.vote) instead of the COUNTs.

For stable list like alltime, lastweek, because they are not supposed to change really fast so that I think you should save the list in your cache with expiration time is around 1 days or longer.

If you concern about correct count in real time, you can check at every page view by comparing the page with lowest page in the cache.

All you need to do is to care for synchronizing between the cache and actual database.

thethanghn

Queries where the order is some function of the current time can become real performance problems. Things get much simpler if you can bucket by calendar time and update scores for each bucket as people vote.

To complete nobody_'s answer I would suggest you read up on the documentation (if you are using MySQL of course).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!