How to create a frequency matrix?

∥☆過路亽.° 提交于 2019-12-12 04:22:46

问题


I just started using Python and I just came across the following problem:

Imagine I have the following list of lists:

list = [["Word1","Word2","Word2","Word4566"],["Word2", "Word3", "Word4"], ...]

The result (matrix) i want to get should look like this:

The Displayed Columns and Rows are all appearing words (no matter which list).

The thing that I want is a programm that counts the appearence of words in each list (by list).

The picture is the result after the first list.

Is there an easy way to achieve something like this or something similar?


EDIT: Basically I want a List/Matrix that tells me how many times words 2-4566 appeared when word 1 was also in the list, and so on.

So I would get a list for each word that displays the absolute frequency of all other 4555 words in relationship with this word.


So I would need an algorithm that iterates through all this lists of words and builts the result lists


回答1:


As far as I understand you want to create a matrix that shows the number of lists where two words are located together for each pair of words.

First of all we should fix the set of unique words:

lst = [["Word1","Word2","Word2","Word4566"],["Word2", "Word3", "Word4"], ...] # list is a reserved word in python, don't use it as a name of variables

words = set()
for sublst in lst:
    words |= set(sublst)
words = list(words)

Second we should define a matrix with zeros:

result = [[0] * len(words)] * len(words) # zeros matrix N x N

And finally we fill the matrix going through the given list:

for sublst in lst:
    sublst = list(set(sublst)) # selecting unique words only
    for i in xrange(len(sublst)):
        for j in xrange(i + 1, len(sublst)):
            index1 = words.index(sublst[i])
            index2 = words.index(sublst[j])
            result[index1][index2] += 1
            result[index2][index1] += 1

print result



回答2:


I find it really hard to understand what you're really asking for, but I'll try by making some assumptions:

  • (1) You have a list (A), containing other lists (b) of multiple words (w).
  • (2) For each b-list in A-list
    • (3) For each w in b:
      • (3.1) count the total number of appearances of w in all of the b-lists
      • (3.2) count how many of the b-lists, in which w appears only once

If these assumptions are correct, then the table doesn't correspond correctly to the list you've provided. If my assumptions are wrong, then I still believe my solution may give you inspiration or some ideas on how to solve it correctly. Finally, I do not claim my solution to be optimal with respect to speed or similar.

OBS!! I use python's built-in dictionaries, which may become terribly slow if you intend to fill them with thousands of words!! Have a look at: https://docs.python.org/2/tutorial/datastructures.html#dictionaries

    frq_dict = {} # num of appearances / frequency
    uqe_dict = {} # unique

    for list_b in list_A:
            temp_dict = {}
            for word in list_b:
                    if( word in temp_dict ):
                            temp_dict[word]+=1
                    else:
                            temp_dict[word]=1

            # frq is the number of appearances 
            for word, frq in temp_dict.iteritems(): 
                    if( frq > 1 ):
                            if( word in frq_dict )
                                    frq_dict[word] += frq
                            else
                                    frq_dict[word] = frq
                    else:
                            if( word in uqe_dict )
                                    uqe_dict[word] += 1
                            else
                                    uqe_dict[word] = 1



回答3:


I managed to come up with the right answer to my own question:

list = [["Word1","Word2","Word2"],["Word2", "Word3", "Word4"],["Word2","Word3"]]

#Names of all dicts
all_words = sorted(set([w for sublist in list for w in sublist]))

#Creating the dicts
dicts = []
for i in all_words:
    dicts.append([i, dict.fromkeys([w for w in all_words if w != i],0)])

#Updating the dicts
for l in list:
    for word in sorted(set(l)):
        tmpL = [w for w in l if w != word]
        ind = ([w[0] for w in dicts].index(word))

        for w in dicts[ind][1]:
            dicts[ind][1][w] += l.count(w)

print dicts

Gets the result:

['Word1', {'Word4': 0, 'Word3': 0, 'Word2': 2}], ['Word2', {'Word4': 1, 'Word1': 1, 'Word3': 2}], ['Word3', {'Word4': 1, 'Word1': 0, 'Word2': 2}], ['Word4', {'Word1': 0, 'Word3': 1, 'Word2': 1}]]



来源:https://stackoverflow.com/questions/41300583/how-to-create-a-frequency-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!