“People who watched this also watched” algorithm

后端未结

关注

 5  873

攒了一身酷

I am trying to code an algorithm that acts a bit like Amazon\'s \"People who bought this also bought\".

The difference between the two is that mine is just counting the

相关标签:

5条回答

情书的邮戳

2021-02-14 04:16
I was able to achieve your desired result using a simple DB structure, and a pretty simple query:

Table
```
TABLE `exa`

| sesh_id | prod_id |
---------------------
| 1       | 1       |
| 1       | 2       |
| 1       | 3       |
| 1       | 4       |
| 2       | 2       |
| 2       | 3       |
| 2       | 4       |
| 3       | 3       |
| 3       | 4       |
| 4       | 1       |
| 4       | 2       |
| 4       | 5       |
```
Query
```
SELECT c.prod_id, COUNT(*)
FROM `exa` a
JOIN `exa` b ON a.prod_id=b.prod_id
JOIN `exa` c ON b.sesh_id=c.sesh_id
WHERE a.`prod_id`=3 AND c.prod_id!=3
GROUP BY c.prod_id
ORDER BY 2 DESC;
```
Result
```
| prod_id | COUNT |
| 4       | 9     |
| 2       | 6     |
| 1       | 3     |
```
The idea is that every time a session views a product, it gets inserted into the table [in this case exa]

Then, on any particular product view, you can check and see what other products people who viewed this product also viewed, weighted by frequency. So in this particular example, EVERYONE that viewed product #3 viewed product #4, so it comes up first in the sort. Product #5 was only viewed by session #4, and session #4 didn't view product #3, so product #5 doesn't come up in the results.
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2021-02-14 04:18

You are not going to need any (any!) +1s.

What you do need is a history. In case of "customers who bought X also bought Y" it's a history of purchases. In case of "customers who saw X also were interested in Y" it's a history of who saw what.

As soon as you have a history, you're ready for making sense of your data. Here, you'll need to adjust your mind to the nearest neighbour problem. What you need is the nearest neighbours for X. The coordinates are users, the values are 0 and 1 depending on whether a user saw/bought an item.

The easiest way to calculate the distance is just sum of squared deltas; it can be easily calculated once an hour for example (as soon as you have plenty of views, distances will stop changing too often), and after that you'll always have a table of distances handy.

The examples for this approach can be found (in Python) in Programming Collective Intelligence published by O'Reilly

0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2021-02-14 04:21

I would make one improvement to your idea. When you're figuring out the cliques that go together and deciding which make the strongest relationships you should add a weight against each connection. The easiest way to calculate the weight would be to see how many people who looked at product X also looked at Y. The more views, the stronger the relationship.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2021-02-14 04:28

The NetFlix prize was won with a solution based on SVD. Implementing a covariance matrix in a database table is a challenge. Implementing SVD in a database might be a research problem. But most people would call it insane.

0 讨论(0)
发布评论:

提交评论
- 加载中...
不要未来只要你来

2021-02-14 04:36

Ok. I think i figured it out. part of the work is code implementation.

What i did was to group by the sessionID, productID. then, in my code I iterate thought each sessionID, and I make a dictionary with pairs. for instance, if I have pid 10 and 20 and 30, which are a clique basically. so I insert to the dictionary in the following way: 1. 10-20, weight 1 2. 20-10, weight 1 3. 10-30, weight 1 4. 30-10, weight 1 5. 20-30, eight 1. 6. 30-20, weight 1.

in case I encounter one of the values again, i will add +1 to the appropriate pair/s.

at the end, i will have the weights and pairs aligned.

all I have to do now is by a given productID, to scan the table and find the clique it is inside.

if you have improvements suggestions, please let me know!

thanks all!

0 讨论(0)
发布评论:

提交评论
- 加载中...

“People who watched this also watched” algorithm

Table

Query

Result