Can someone give an example of cosine similarity, in a very simple, graphical way?

前端 未结 10 1799
别跟我提以往
别跟我提以往 2020-11-28 17:04

Cosine Similarity article on Wikipedia

Can you show the vectors here (in a list or something) and then do the math, and let us see how it works?

I\'m a begin

相关标签:
10条回答
  • 2020-11-28 17:50

    This is a simple Python code which implements cosine similarity.

    from scipy import linalg, mat, dot
    import numpy as np
    
    In [12]: matrix = mat( [[2, 1, 0, 2, 0, 1, 1, 1],[2, 1, 1, 1, 1, 0, 1, 1]] )
    
    In [13]: matrix
    Out[13]: 
    matrix([[2, 1, 0, 2, 0, 1, 1, 1],
            [2, 1, 1, 1, 1, 0, 1, 1]])
    In [14]: dot(matrix[0],matrix[1].T)/np.linalg.norm(matrix[0])/np.linalg.norm(matrix[1])
    Out[14]: matrix([[ 0.82158384]])
    
    0 讨论(0)
  • 2020-11-28 17:52

    Using @Bill Bell example, two ways to do this in [R]

    a = c(2,1,0,2,0,1,1,1)
    
    b = c(2,1,1,1,1,0,1,1)
    
    d = (a %*% b) / (sqrt(sum(a^2)) * sqrt(sum(b^2)))
    

    or taking advantage of crossprod() method's performance...

    e = crossprod(a, b) / (sqrt(crossprod(a, a)) * sqrt(crossprod(b, b)))
    
    0 讨论(0)
  • 2020-11-28 17:58

    Here are two very short texts to compare:

    1. Julie loves me more than Linda loves me

    2. Jane likes me more than Julie loves me

    We want to know how similar these texts are, purely in terms of word counts (and ignoring word order). We begin by making a list of the words from both texts:

    me Julie loves Linda than more likes Jane
    

    Now we count the number of times each of these words appears in each text:

       me   2   2
     Jane   0   1
    Julie   1   1
    Linda   1   0
    likes   0   1
    loves   2   1
     more   1   1
     than   1   1
    

    We are not interested in the words themselves though. We are interested only in those two vertical vectors of counts. For instance, there are two instances of 'me' in each text. We are going to decide how close these two texts are to each other by calculating one function of those two vectors, namely the cosine of the angle between them.

    The two vectors are, again:

    a: [2, 0, 1, 1, 0, 2, 1, 1]
    
    b: [2, 1, 1, 0, 1, 1, 1, 1]
    

    The cosine of the angle between them is about 0.822.

    These vectors are 8-dimensional. A virtue of using cosine similarity is clearly that it converts a question that is beyond human ability to visualise to one that can be. In this case you can think of this as the angle of about 35 degrees which is some 'distance' from zero or perfect agreement.

    0 讨论(0)
  • 2020-11-28 18:00

    Here's my implementation in C#.

    using System;
    
    namespace CosineSimilarity
    {
        class Program
        {
            static void Main()
            {
                int[] vecA = {1, 2, 3, 4, 5};
                int[] vecB = {6, 7, 7, 9, 10};
    
                var cosSimilarity = CalculateCosineSimilarity(vecA, vecB);
    
                Console.WriteLine(cosSimilarity);
                Console.Read();
            }
    
            private static double CalculateCosineSimilarity(int[] vecA, int[] vecB)
            {
                var dotProduct = DotProduct(vecA, vecB);
                var magnitudeOfA = Magnitude(vecA);
                var magnitudeOfB = Magnitude(vecB);
    
                return dotProduct/(magnitudeOfA*magnitudeOfB);
            }
    
            private static double DotProduct(int[] vecA, int[] vecB)
            {
                // I'm not validating inputs here for simplicity.            
                double dotProduct = 0;
                for (var i = 0; i < vecA.Length; i++)
                {
                    dotProduct += (vecA[i] * vecB[i]);
                }
    
                return dotProduct;
            }
    
            // Magnitude of the vector is the square root of the dot product of the vector with itself.
            private static double Magnitude(int[] vector)
            {
                return Math.Sqrt(DotProduct(vector, vector));
            }
        }
    }
    
    0 讨论(0)
提交回复
热议问题