similarity

Cannot use the Knowledge academic API

微笑、不失礼 提交于 2020-01-14 18:58:53
问题 I have a problem when I try to use the function similarity proposed in the academic knowledge API. I tested the following commad to compute the similarity between two string: curl -v -X GET "https://api.labs.cognitive.microsoft.com/academic/v1.0/similarity?s1={string}&s2={string}" -H "Ocp-Apim-Subscription-Key: {subscription key}" The error that I get is : {"error":{"code":"Unspecified","message":"Access denied due to invalid subscript ion key. Make sure you are subscribed to an API you are

Java中Double或Float类型,转百分数

隐身守侯 提交于 2020-01-13 13:50:14
Double类型,转百分数,代码如下: public static String getPercentValue( double similarity){ NumberFormat fmt = NumberFormat.getPercentInstance(); fmt.setMaximumFractionDigits(2);//最多两位百分小数,如25.23% return fmt.format(similarity); } Float类型,转百分数,代码如下: public static String getPercentValue( float similarity){ NumberFormat fmt = NumberFormat.getPercentInstance(); fmt.setMaximumFractionDigits(2);//最多两位百分小数,如25.23% return fmt.format(similarity); } 来源: CSDN 作者: 秋9 链接: https://blog.csdn.net/jlq_diligence/article/details/103955965

PHP nearest string comparison [duplicate]

可紊 提交于 2020-01-11 04:12:04
问题 This question already has answers here : Closed 6 years ago . Possible Duplicate: String similarity in PHP: levenshtein like function for long strings I have my subject string $subj = "Director, My Company"; and a list of multiple strings to be compared: $str1 = "Foo bar"; $str2 = "Lorem Ipsum"; $str3 = "Director"; What I want to achieve here is to find the nearest string related to $subj . Is it possible to do it? 回答1: The levenshtein() function will do what you expect. The Levenshtein

Python digest/hash for string similarity

a 夏天 提交于 2020-01-09 23:02:02
问题 I'm looking for an algorithm which can generate a short (fx 16 chars (not important) hashcode/digest from a longer string. The main requirement is that strings which is almost identical should result in the same digest. Fx 2 almost identical mail: Hi Martin. Here are some ... spam for you. Regards XYZ. => AAAA AAAA AAAA AAAA Hi Bo. Here are some ... spam for you. Regards EFG. => AAAA AAAA AAAA AAAA returns the same diges (or almost the same), where as a different mail: Hello Finn. This is a

How to group sets by similarity in contained elements

怎甘沉沦 提交于 2020-01-06 19:25:59
问题 I am using Python 2.7. I have routes which are composed of arrays of nodes that connect to each other. The nodes are identified by a string key, but for ease I will use numbers: sample_route = [1,2,3,4,7] #obviously over-simplified; real things would be about 20-40 elements long I will create a set made up of tuple pairs of point connections using zip, which will end up like: set([(1,2),(2,3),(3,4),(4,7)]) I will need a way to filter out some routes that are very similar (like one or two

How to group sets by similarity in contained elements

一曲冷凌霜 提交于 2020-01-06 19:25:32
问题 I am using Python 2.7. I have routes which are composed of arrays of nodes that connect to each other. The nodes are identified by a string key, but for ease I will use numbers: sample_route = [1,2,3,4,7] #obviously over-simplified; real things would be about 20-40 elements long I will create a set made up of tuple pairs of point connections using zip, which will end up like: set([(1,2),(2,3),(3,4),(4,7)]) I will need a way to filter out some routes that are very similar (like one or two

Is there any solution to get score of similarity between lists of words?

纵饮孤独 提交于 2020-01-06 08:05:02
问题 I want to calculate the similarity between lists of words, for example : import math,re from collections import Counter test = ['address','ip'] list_a = ['identifiant', 'ip', 'address', 'fixe', 'horadatee', 'cookie', 'mac', 'machine', 'network', 'cable'] list_b = ['address','city'] def counter_cosine_similarity(c1, c2): terms = set(c1).union(c2) print(c2.get('ip',0)**2) dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms) magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms)) magB = math

Is there any solution to get score of similarity between lists of words?

与世无争的帅哥 提交于 2020-01-06 08:03:01
问题 I want to calculate the similarity between lists of words, for example : import math,re from collections import Counter test = ['address','ip'] list_a = ['identifiant', 'ip', 'address', 'fixe', 'horadatee', 'cookie', 'mac', 'machine', 'network', 'cable'] list_b = ['address','city'] def counter_cosine_similarity(c1, c2): terms = set(c1).union(c2) print(c2.get('ip',0)**2) dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms) magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms)) magB = math

Efficiently calculate large similarity matrix

别等时光非礼了梦想. 提交于 2020-01-05 07:12:17
问题 In a project I'm currently working reside about 200,000 users. For each of these users we defined a similarity measure with regard to an other user. This yields a similarity matrix of 200000x200000. A tad large. A naive approach (in Ruby) of calculating each entry would take days. What strategies can I employ to to make computing the matrix fields feasible? In what data store should I put this beast? 回答1: Here are some bits and pieces of an answer, there are still too many gaps in what you've

R Pairwise comparison of matrix columns ignoring empty values

强颜欢笑 提交于 2020-01-05 03:09:11
问题 I have an array for which I would like to obtain a measure of the similarity between values in each column. By which I mean I wish to compare the rows between pairwise columns of the array and increment a measure when their values match. The resulting measure would then be at a maximum for two columns exactly the same. Essentially my problem is the same as discussed here: R: Compare all the columns pairwise in matrix except that I do not wish empty cells to be counted. With the example data