I have a cities table which looks like this.
|id| Name |
|1 | Paris |
|2 | London |
|3 | New York|
I have a tags table which looks li
Too late, but I think that none of answers are fully correct. I got the best part of each one and put all together to make my own answer:
(q.sets + q.parisset) AS union
and (q.sets - q.parisset) AS intersect
is very wrong.cities
table like this.
| id | Name |
| 1 | Paris |
| 2 | Florence |
| 3 | New York |
| 4 | São Paulo |
| 5 | London |
cities_tag
table like this.
| city_id | tag_id |
| 1 | 1 |
| 1 | 3 |
| 2 | 1 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 1 |
| 5 | 2 |
| 5 | 3 |
With this sample data, Florence have a full matches with Paris, New York matches one tag, São Paulo have no tags matches and London matches two tags and have another one. I think the Jaccard Index of this sample is:
Florence: 1.000 (2/2)
London: 0.666 (2/3)
New York: 0.333 (1/3)
São Paulo: 0.000 (0/3)
My query is like this:
select jaccard.city,
jaccard.intersect,
jaccard.union,
jaccard.intersect/jaccard.union as 'jaccard index'
from
(select
c2.name as city
,count(ct2.tag_id) as 'intersect'
,(select count(distinct ct3.tag_id)
from cities_tags ct3
where ct3.city_id in(c1.id, c2.id)) as 'union'
from
cities as c1
inner join cities as c2 on c1.id != c2.id
left join cities_tags as ct1 on ct1.city_id = c1.id
left join cities_tags as ct2 on ct2.city_id = c2.id and ct1.tag_id = ct2.tag_id
where c1.id = 1
group by c1.id, c2.id) as jaccard
order by jaccard.intersect/jaccard.union desc