How are Reddit and Hacker News ranking algorithms used?

后端 未结 2 728
后悔当初
后悔当初 2021-01-30 02:42

I\'ve been looking at ranking algorithms recently, specifically those used by Reddit and Hacker News. The algorithms themselves are simple enough, but I don\'t quite understand

相关标签:
2条回答
  • 2021-01-30 03:12

    Reddit uses Pyrex, the sort algorithm is a Python C extension to improve performance.

    So, you can do the same in SQL when the record is updated, pex: when is up or down voted.

    The pseudocode you must to translate to your SQL engine syntax:

    function hot(ups, downs, date){
        score = ups - downs;
        order = log(max(abs(score), 1), 10);
        if (score>0){
            sign = 1;
        } else {
            if (score<0){
                sign = -1;
            } else {
                sign = 0;
            }
        }
        td = date - datetime(1970,1,1);
        seconds = td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000) - 1134028003;
    
        return round(order + sign * seconds / 45000, 7);
    }
    

    So you must to store in the post table the ups, downs, date and the hot function result. And then you can make a sort in the hot column.

    You can see the Reddit source code here: http://code.reddit.com/

    0 讨论(0)
  • 2021-01-30 03:21

    I implemented an SQL version of Reddit's ranking algorithm for a video aggregator like so:

    SELECT id, title
    FROM videos
    ORDER BY 
        LOG10(ABS(cached_votes_total) + 1) * SIGN(cached_votes_total)   
        + (UNIX_TIMESTAMP(created_at) / 300000) DESC
    LIMIT 50
    

    cached_votes_total is updated by a trigger whenever a new vote is cast. It runs fast enough on our current site, but I am planning on adding a ranking value column and updating it with the same trigger as the cached_votes_total column. After that optimization, it should be fast enough for most any size site.

    edit: More information at Reddit Hotness Algorithm in SQL

    0 讨论(0)
提交回复
热议问题