I would like to create a matrix from a three column file. I am sure it\'s something extremely easy, but I just do not understand how it needs to be done. Please be gentle, I
You can use this library http://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html
You just need to make proper adjustment.
hope it helps.
Although there's already an accepted answer, it uses pandas. A relatively generic way of getting the same effect but by not using a additional library is this: (numpy is used because OP specified numpy, however you can achieve the same thing with lists)
import string
import numpy as np
up = string.ascii_uppercase
uppercase = list()
for letter in up:
uppercase.append(letter)
file = open("a.txt")
matrix = np.zeros((3, 3))
for line in file.readlines():
tmp = line.strip()
tmp = tmp.split(" ")
idx = uppercase.index(tmp[0])
idy = uppercase.index(tmp[1])
matrix[idx, idy] = tmp[2]
Idea is that you gather all the alphabetical letters, hopefully OP will limit themselves to just the English alphabet without special chars (šđćžčę°e etc...).
We create a list of from the alphabet so that we can use the index
method to retrieve the index value. I.e. uppercase.index("A")
is 0
. We can use these indices to fill in our array.
Read in file line by line, strip extra characters, split by space to get:
['A', 'A', '5']
['A', 'B', '4']
This is now the actual working part:
idx = uppercase.index(tmp[0])
idy = uppercase.index(tmp[1])
matrix[idx, idy] = tmp[2]
I.e. for letter "A", idx
evaluates to 0
and so does idy
. Then matrix[0,0]
becomes the value tmp[2]
which is 4
. Following the same logic for "B" we get matrix[0,1]=5
. And so on.
A more generalized case would be to declare matrix = np.zeros((3, 3))
as matrix = np.zeros((26, 26))
because there are 26 letters in english alphabet and the OP doesn't have to just use "ABC", but could potentially use the entire range A-Z.
Example output for upper program would be:
>>> matrix
array([[ 5., 4., 3.],
[ 0., 2., 1.],
[ 0., 0., 0.]])
You're matrix seems to resember an adjacency matrix of a graph.
I find the answer with pandas much more concise and elegant. Here's my attempt without adding pandas as an additional dependency.
<!-- language: python -->
f = open('.txt', 'r');
EdgeKey = namedtuple("EdgeKey", ["src", "dst"])
g = dict()
for line in f:
elems = line.split(' ');
key = EdgeKey(src=elems[0], dst=elems[1])
g[key] = elems[2]
key_rev = EdgeKey(src=elems[1], dst=elems[0]) # g[A, B] == g[B, A]
g[key_rev] = elems[2]
vertices = set()
for src, dst in g.keys():
vertices.add(src)
vertices.add(dst)
vertices = list(vertices)
vertices.sort()
# create adjacency matrix
mat = np.zeros((len(vertices), len(vertices)))
for s, src in enumerate(vertices):
for d, dst in enumerate(vertices):
e = EdgeKey(src=src, dst=dst)
if e in g:
mat[s, d] = int(g[e])
# print adjacency matrix
print ' ' , ' '.join(vertices) # print header
for i, row in enumerate(mat):
print vertices[i], ' '.join([str(int(c)) for c in row.tolist()])
Try:
import pandas as pd
import numpy as np
raw = []
with open('test.txt','r') as f:
for line in f:
raw.append(line.split())
data = pd.DataFrame(raw,columns = ['row','column','value'])
data_ind = data.set_index(['row','column']).unstack('column')
np.array(data_ind.values,dtype=float))
Output:
array([[ 5., 4., 3.],
[ nan, 2., 1.],
[ nan, nan, 0.]])