I just started using Python and I just came across the following problem:
Imagine I have the following list of lists:
list = [["Word1","Word2","Word2","Word4566"],["Word2", "Word3", "Word4"], ...]
The result (matrix) i want to get should look like this:
The Displayed Columns and Rows are all appearing words (no matter which list).
The thing that I want is a programm that counts the appearence of words in each list (by list).
The picture is the result after the first list.
Is there an easy way to achieve something like this or something similar?
Basically I want a List/Matrix that tells me how many times words 2-4566 appeared when word 1 was also in the list, and so on.
So I would get a list for each word that displays the absolute frequency of all other 4555 words in relationship with this word.
So I would need an algorithm that iterates through all this lists of words and builts the result lists
As far as I understand you want to create a matrix that shows the number of lists where two words are located together for each pair of words.
First of all we should fix the set of unique words:
lst = [["Word1","Word2","Word2","Word4566"],["Word2", "Word3", "Word4"], ...] # list is a reserved word in python, don't use it as a name of variables
words = set()
for sublst in lst:
words |= set(sublst)
words = list(words)
Second we should define a matrix with zeros:
result = [[0] * len(words)] * len(words) # zeros matrix N x N
And finally we fill the matrix going through the given list:
for sublst in lst:
sublst = list(set(sublst)) # selecting unique words only
for i in xrange(len(sublst)):
for j in xrange(i + 1, len(sublst)):
index1 = words.index(sublst[i])
index2 = words.index(sublst[j])
result[index1][index2] += 1
result[index2][index1] += 1
print result
I find it really hard to understand what you're really asking for, but I'll try by making some assumptions:
- (1) You have a list (A), containing other lists (b) of multiple words (w).
- (2) For each b-list in A-list
- (3) For each w in b:
- (3.1) count the total number of appearances of w in all of the b-lists
- (3.2) count how many of the b-lists, in which w appears only once
- (3) For each w in b:
If these assumptions are correct, then the table doesn't correspond correctly to the list you've provided. If my assumptions are wrong, then I still believe my solution may give you inspiration or some ideas on how to solve it correctly. Finally, I do not claim my solution to be optimal with respect to speed or similar.
OBS!! I use python's built-in dictionaries, which may become terribly slow if you intend to fill them with thousands of words!! Have a look at: https://docs.python.org/2/tutorial/datastructures.html#dictionaries
frq_dict = {} # num of appearances / frequency
uqe_dict = {} # unique
for list_b in list_A:
temp_dict = {}
for word in list_b:
if( word in temp_dict ):
# frq is the number of appearances
for word, frq in temp_dict.iteritems():
if( frq > 1 ):
if( word in frq_dict )
frq_dict[word] += frq
frq_dict[word] = frq
if( word in uqe_dict )
uqe_dict[word] += 1
uqe_dict[word] = 1
I managed to come up with the right answer to my own question:
list = [["Word1","Word2","Word2"],["Word2", "Word3", "Word4"],["Word2","Word3"]]
#Names of all dicts
all_words = sorted(set([w for sublist in list for w in sublist]))
#Creating the dicts
dicts = []
for i in all_words:
dicts.append([i, dict.fromkeys([w for w in all_words if w != i],0)])
#Updating the dicts
for l in list:
for word in sorted(set(l)):
tmpL = [w for w in l if w != word]
ind = ([w[0] for w in dicts].index(word))
for w in dicts[ind][1]:
dicts[ind][1][w] += l.count(w)
print dicts
Gets the result:
['Word1', {'Word4': 0, 'Word3': 0, 'Word2': 2}], ['Word2', {'Word4': 1, 'Word1': 1, 'Word3': 2}], ['Word3', {'Word4': 1, 'Word1': 0, 'Word2': 2}], ['Word4', {'Word1': 0, 'Word3': 1, 'Word2': 1}]]