问题
I have a social network set up and via an api I want to search the entries. The database of the social network is mysql. I want the search to return results in the following format: Results that match the query AND are friends of the user performing the search should be prioritized over results that simply match the query.
So can this be done in one query or will I have to do two separate queries and merge the results and remove duplicates?
I could possibly build up a data structure using Lucene and search that index efficiently, but am wondering if the penalty of updating a document everytime a new relationship is created is going to be too much?
Thanks
回答1:
The reference to Lucene complicates the equation a little bit. Let's solve it (or at least get a baseline) without it first.
Assuming the following datamodel (or something approaching.
tblUsers UserId PK UserName Age ... tblBuddies UserId FK to tblUsers.UserId FriendId tblUsers.Userid = Id of one of the friends BuddyRating float 0.0 to 1.0 (or whatever normalized scale) indicating the level of friendship/similarity/whatever tblItems ItemId PK ItemName Description Price ... tblUsersToItems UserId FK to tblUsers.UserId ItemId FK to ItemRating float 0.0 to 1.0 (or whatever normalized scale) indicating the "value" assigned to item by user.
A naive query (but a good basis for an optimized one) could be:
SELECT [TOP 25] I.ItemId, ItemName, Description, SUM(ItemRating * BuddyRating) FROM tblItems I LEFT JOIN tblUserToItems UI ON I.ItemId = UI.ItemId LEFT JOIN tblBuddies B ON UI.UserId = B.FriendId WHERE B.UserId = 'IdOfCurrentUser' AND SomeSearchCriteria -- Say ItemName = 'MP3 Player' GROUP BY I.ItemId, ItemName, Description ORDER BY SUM(ItemRating * BuddyRating) DESC
The idea is that a given item is given more weight if it is recommended/used by a friend. The extra weigh is the more important if the friend is a a close friend [BuddyRating] and/or if the friend recommend this item more strongly [ItemRating]
Optimizing such a query depends on the overal number of item, the average/max numbers of buddies a given user has, the average/max number of items a user may have in his/her list.
Is this type of ideas/info you are seeking or am I missing the question?
回答2:
One way is to store all your social network graph separately from Lucene. Run your keyword query on Lucene, and also lookup all the friends in your network graph. For all the friends that are returned, boost all of those friends' search results by some factor and resort. This re-sort would be done outside of Lucene. I've done things like this before and it performs pretty well.
You can also create a custom HitCollector that does the boosting as the hits are being collected in Lucene. You'd have to construct a list of internal Lucene ID's that belong to the friends of the current user.
Your social network graph can be stored in Mysql, in memory as a sparse adjacency matrix, or you can take a look at Neo4j.
来源:https://stackoverflow.com/questions/1779954/whats-the-best-way-to-search-a-social-network-by-prioritizing-a-users-relations