Algorithm To Select Most Popular Places from Database

瘦欲@ 提交于 2020-01-01 12:06:39

问题


We have a website that contains a database of places. For each place our users are able to take one of the follow actions which we record:

  1. VIEW - View it's profile
  2. RATING - Rate it on a scale of 1-5 stars
  3. REVIEW - Review it
  4. COMPLETED - Mark that they've been there
  5. WISH LIST - Mark that they want to go there
  6. FAVORITE - Mark that it's one of their favorites

In our database table of places each place contains a count of the number of times each action above was taken as well as the average rating given by users.

  • views
  • ratings
  • avg_rating
  • completed
  • wishlist
  • favorite

What we want to be able to do is generate lists of the top places using the above information. Ideally, we would want to be able to generate this list using a relatively simple SQL query without needing to do any legwork to calculate additional fields or stack rank places against one another. That being said, since we only have about 50,000 places we could run a nightly cron job to calculate some fields such as rankings on different categories if it would make a meaningful difference in the overall results of our top places.

I'd appreciate if you could make some suggestions on how we should think about bubbling the best places to the top, which criteria we should weight more heavily, and given that information - suggest what the MySQL query would need to look like in order to select the top 10 places.

One thing to note is that at this time we are less concerned with the recency of a place being popular - meaning that looking at the aggregate information is fine and that more recent data doesn't need to be weighted more heavily.

Thanks in advance for your help & advice!


回答1:


The exact formula for the favorite would need to come from you, we will call it f(x).
For the actual implementation I would add a popularity_score field that I would calculate, as you said from a nightly cron job using f(x) for each row.

Then it is simply a case of doing a "select place name from table order by popularity_score desc".

Ok - Let's give it a stab popularity_score = (FAVORITE * 3 + COMPLETED * 2 + WISHLIST) * RATING * VIEW / AVG_VIEWS_OF_ALL_PROFILES




回答2:


Use the lower bound of the Wilson score confidence interval for a Bernoulli parameter!




回答3:


I don't have an opinion on how to weigh things.

That said, why not just add a popularity column to the location table? All of a sudden, your SQL query is incredibly simple.

The tricky part, of course, is figuring out how and when to update that value. But since you're saving all of the activity data, you can always regenerate the popularity values from the log entries.

That way, you get nice fast queries for "most popular" locations, and if you want to change the way popularity is computed, you can do so at will.

If you're clever, you might be able to devise a simple enough formula so that popularity can be tracked in real time. For instance, if we only cared about average ratings, you can modify the average rating with just three variables: the current average rating, the number of times the object has been rated, and the new rating value.

Of course, things get more complex when you start mixing in how many times the object has been viewed, reviewed, favorited, etc ... but you might find that you can devise a method that's computationally cheap enough that you can update the overall popularity value on just about every action.



来源:https://stackoverflow.com/questions/2706964/algorithm-to-select-most-popular-places-from-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!