what's the best way to search a social network by prioritizing a users relationships first?

不问归期 提交于 2019-12-04 18:02:46

The reference to Lucene complicates the equation a little bit. Let's solve it (or at least get a baseline) without it first.

Assuming the following datamodel (or something approaching.

tblUsers
  UserId  PK
  UserName
  Age
  ...

tblBuddies
  UserId     FK to tblUsers.UserId
  FriendId   tblUsers.Userid  = Id of one of the friends
  BuddyRating     float 0.0 to 1.0 (or whatever normalized scale) indicating 
                  the level of friendship/similarity/whatever

tblItems
  ItemId  PK
  ItemName
  Description
  Price
  ...

tblUsersToItems
   UserId   FK to tblUsers.UserId
   ItemId   FK to 
   ItemRating   float 0.0 to 1.0 (or whatever normalized scale) indicating 
                the "value" assigned to item by user.

A naive query (but a good basis for an optimized one) could be:

SELECT [TOP 25]  I.ItemId, ItemName, Description, SUM(ItemRating * BuddyRating)
FROM tblItems I
LEFT JOIN tblUserToItems UI ON I.ItemId = UI.ItemId
LEFT JOIN tblBuddies B ON UI.UserId = B.FriendId
WHERE B.UserId = 'IdOfCurrentUser'
  AND SomeSearchCriteria -- Say ItemName = 'MP3 Player'
GROUP BY I.ItemId, ItemName, Description
ORDER BY SUM(ItemRating * BuddyRating) DESC

The idea is that a given item is given more weight if it is recommended/used by a friend. The extra weigh is the more important if the friend is a a close friend [BuddyRating] and/or if the friend recommend this item more strongly [ItemRating]

Optimizing such a query depends on the overal number of item, the average/max numbers of buddies a given user has, the average/max number of items a user may have in his/her list.

Is this type of ideas/info you are seeking or am I missing the question?

One way is to store all your social network graph separately from Lucene. Run your keyword query on Lucene, and also lookup all the friends in your network graph. For all the friends that are returned, boost all of those friends' search results by some factor and resort. This re-sort would be done outside of Lucene. I've done things like this before and it performs pretty well.

You can also create a custom HitCollector that does the boosting as the hits are being collected in Lucene. You'd have to construct a list of internal Lucene ID's that belong to the friends of the current user.

Your social network graph can be stored in Mysql, in memory as a sparse adjacency matrix, or you can take a look at Neo4j.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!