This is an easy question but say I have an MxN matrix. All I want to do is extract specific columns and store them in another numpy array but I get invalid syntax errors. He
Assuming you want to get columns 1 and 9 with that code snippet, it should be:
extractedData = data[:,[1,9]]
I assume you wanted columns 1
and 9
?
To select multiple columns at once, use
X = data[:, [1, 9]]
To select one at a time, use
x, y = data[:, 1], data[:, 9]
With names:
data[:, ['Column Name1','Column Name2']]
You can get the names from data.dtype.names
…
if you want to extract only some columns:
idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]
if you want to exclude specific columns:
idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]
One more thing you should pay attention to when selecting columns from N-D array using a list like this:
data[:,:,[1,9]]
If you are removing a dimension (by selecting only one row, for example), the resulting array will be (for some reason) permuted. So:
print data.shape # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape # gives [2,20] instead of [20,2]!!
One thing I would like to point out is, if the number of columns you want to extract is 1 the resulting matrix would not be a Mx1 Matrix as you might expect but instead an array containing the elements of the column you extracted.
To convert it to Matrix the reshape(M,1) method should be used on the resulting array.
Just:
>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355 , 0.33025395],
[0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
[0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
[0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
[0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
[0.67463754, 0.43158254],
[0.86431513, 0.12153138],
[0.66139215, 0.08400288],
[0.76385882, 0.11002419]])
The columns need not to be in order:
>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355 ],
[0.43158254, 0.67463754, 0.95367876],
[0.12153138, 0.86431513, 0.73006437],
[0.08400288, 0.66139215, 0.56769924],
[0.11002419, 0.76385882, 0.2509888 ]])