问题
Have you ever noticed how facebook says “3 friends and 33 others liked this”? I was wondering what the best approach to do this is. I don’t think going through the friends list, and the list of users who “liked this” and comparing them is efficient at all! Do they keep a track of this in the database? That will make the database size very huge. What do you guys think?
Thanks!
回答1:
I would guess they outer join their friends
table with their likes
table to count both regular likes and friend likes at the same time.
With the proper indexes, it wouldn't be a slow query at all. Huge databases aren't necessarily slow, so there's really no reason to not store all of this information in a database. The trick is to make sure the indexes and partitions (if any) are set up well.
回答2:
Facebook uses Cassandra, a NoSQL database for at least some things. Here's a more detailed discussion of what some of the bigger social media sites do to solve these problems:
http://www.25hoursaday.com/weblog/2009/09/10/BuildingScalableDatabasesDenormalizationTheNoSQLMovementAndDigg.aspx
Lots of interesting reading in there if you follow the links from it to the Digg blog post, etc.
回答3:
Yes they definitely keep it in their database as they definitely have more than 1 server that needs to access the data.
As for scalability, I'm sure they use a lot of caching.
Here is an example:
If you have 1 million rows to go through, an index can perform O(logn) = 20 operations (in the worst case) only to find what you need.
For 2 million, you only need 21 operations (in the worst case) to find what you need.
Every time you double the amount of users to go through you simply need only 1 more operation (in the worst case) with a O(logn) index.
They also have a distributed architecture or a clustered database.
回答4:
Facebook must be using a trigger(which automatically gets executed as soon as an event occurs).
For example, suppose a trigger is created to store the count and names of people who liked the status, then it will get executed every time when someone likes your status and that too implicitly (automatically).
This makes the operation way too easy and Facebook doesn't have to manually update the database or store a huge database for this. Also,this approach is a faster one.
回答5:
In designing social networking software (mothsorchid.com) I found the only way to address this is to pre-cache streams of notifications. One doesn't query the database at the time of page load to count how many friends and others 'liked this', when someone 'likes' something that is recorded on the object, and when retrieving the object one can compare with the current user's friend list. If someone updates their profile/makes a comment/etc it sends notification objects to friends which are pre-cached in their feeds. Cuts down tremendously on database work at expense of disk space, but disk space is cheap.
As to how Facebook does this, they use Cassandra DBMS, which is probably a little different to what you have in mind.
回答6:
Keep in mind that Facebook strongly utilizes memcached, so they're retaining a lot of data in memory and only refreshing it when absolutely necessary. See this blog post for some scalability discussion around this:
http://www.facebook.com/note.php?note_id=39391378919
回答7:
Each entry that somebody can like probably contains a list of everybody who does like it (all of this is of course in a database). When you view that entry, they match it against your friends list to see which of them is your friend. Voila.
回答8:
A lot of this are explained by the Director of Engineering of Facebook in this QCon presentation :
http://www.infoq.com/presentations/Facebook-Software-Stack
A great presentation to watch.....
来源:https://stackoverflow.com/questions/1422618/how-does-facebook-do-it