similarity

Compare similarities between two result sets

爱⌒轻易说出口 提交于 2019-12-31 01:51:31
问题 I am creating a music website where I would like users to be able to find users who like approximately the same artists as they. I have a 'like' table that has 2 columns 'id_user', 'id_artist'. Here is an example of how I would like it to work: User 1 likes: 1, 12 1, 13 1, 14 1, 26 1, 42 1, 44 User 2 likes: 2, 13 2, 14 2, 15 2, 26 2, 42 2, 56 Those 2 users have 4 artists in common. Is there a way, to compare those 2 results sets, to find the most similar people in the database? My first idea

Ways to calculate similarity

不羁岁月 提交于 2019-12-29 13:19:06
问题 I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes: age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others. Can anyone tell me how to go about this problem or point me to some resources? 回答1: Another way of computing (in R) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be

Ways to calculate similarity

和自甴很熟 提交于 2019-12-29 13:18:52
问题 I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes: age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others. Can anyone tell me how to go about this problem or point me to some resources? 回答1: Another way of computing (in R) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be

String similarity score/hash

纵饮孤独 提交于 2019-12-28 07:39:10
问题 Is there a method to calculate something like general "similarity score" of a string? In a way that I am not comparing two strings together but rather I get some number (hash) for each string that can later tell me that two strings are or are not similar. Two similar strings should have similar (close) hashes. Let's consider these strings and scores as an example: Hello world 1000 Hello world! 1010 Hello earth 1125 Foo bar 3250 FooBarbar 3750 Foo Bar! 3300 Foo world! 2350 You can see that

Solr Custom Similarity

房东的猫 提交于 2019-12-28 03:06:09
问题 i want to set my own custom similarity in my solr schema.xml but i have a few problems with understanding this feature. I want to completely deactivate solr scoring (tf,idf,coord and fieldNorm). I dont know where to start. Things i know I have to write my own DefaultSimilarity implementation. Override the (tf,idf,coord and fieldNorm) - methods. Load the class in schem.xml Where to store the class ? Are there any working examples in the web ? I cant find one! THANKS 回答1: I figured it out on my

Calculating similarity measure between millions of documents

≡放荡痞女 提交于 2019-12-25 18:24:09
问题 I have millions of documents(close to 100 million), each document has fields such as skills , hobbies , certification and education . I want to find similarity between each document along with a score. Below is an example of data. skills hobbies certification education Java fishing PMP MS Python reading novel SCM BS C# video game PMP B.Tech. C++ fishing PMP MS so what i want is similarity between first row and all other rows, similarity between second row and all other rows and so on. So,

Calculating similarity measure between millions of documents

左心房为你撑大大i 提交于 2019-12-25 18:24:05
问题 I have millions of documents(close to 100 million), each document has fields such as skills , hobbies , certification and education . I want to find similarity between each document along with a score. Below is an example of data. skills hobbies certification education Java fishing PMP MS Python reading novel SCM BS C# video game PMP B.Tech. C++ fishing PMP MS so what i want is similarity between first row and all other rows, similarity between second row and all other rows and so on. So,

PHP - Group Strings By Similarity / Substring [closed]

北战南征 提交于 2019-12-25 10:51:51
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . PHP Hi, I have been struggling with this problem for awhile and can not find a solution to it and was wondering if anyone could help. I need to group similar strings for example: Input Slim Aluminium HDMI Lead, 1m Blue Slim Aluminium HDMI Lead, 2m Blue Slim Aluminium HDMI Lead, 3m

ElasticSearch Analyzer on text field

孤街浪徒 提交于 2019-12-24 08:12:43
问题 Here is my field on elasticSearch : "keywordName": { "type": "text", "analyzer": "custom_stop" } Here is my analyzer : "custom_stop": { "type": "custom", "tokenizer": "standard", "filter": [ "my_stop", "my_snow", "asciifolding" ] } And here are my filters : "my_stop": { "type": "stop", "stopwords": "_french_" }, "my_snow" : { "type" : "snowball", "language" : "French" } Here are my documents my index (in my only field : keywordName) : "canne a peche", "canne", "canne a peche telescopique",

spacy similarity method doesn't not work correctly

落爺英雄遲暮 提交于 2019-12-24 04:52:11
问题 I always get a lot of help from stack overflows. Thank you all the time. I am doing simple natural language processing using spacy . I'm working on filtering out words by measuring the similarity between words. I wrote and used the following simple code shown in the spacy documentation, but the result does not look like a documentation. import spacy nlp = spacy.load('en_core_web_lg') tokens = nlp('dog cat banana') for token1 in tokens: for token2 in tokens: sim = token1.similarity(token2)