A coworker left some data files I want to analyze with Numpy.
Each file is a matlab file, say data.m
, and have the following formatting (but with a lot more
Here are a couple options, although neither is built in.
This solution probably falls into your "quick and dirty" category, but it helps lead in to the next solution.
Remove the values = [
, the last line (];
), and globally replace all ;
with nothing to get:
-24.92 -23.66 -22.55
-24.77 -23.56 -22.45
-24.54 -23.64 -22.56
Then you can use numpy's loadtxt
as follows.
>>> import numpy as np
>>> A = np.loadtxt('data.m')
>>> A
array([[-24.92, -23.66, -22.55],
[-24.77, -23.56, -22.45],
[-24.54, -23.64, -22.56]])
In this solution, we create a method to coerce the input data into a form that numpy loadtxt
likes (the same form as above, actually).
import StringIO
import numpy as np
def convert_m(fname):
with open(fname, 'r') as fin:
arrstr = fin.read()
arrstr = arrstr.split('[', 1)[-1] # remove the content up to the first '['
arrstr = arrstr.rsplit(']', 1)[0] # remove the content after ']'
arrstr = arrstr.replace(';', '\n') # replace ';' with newline
return StringIO.StringIO(arrstr)
Now that we have that, do the following.
>>> np.loadtxt(convert_m('data.m'))
array([[-24.92, -23.66, -22.55],
[-24.77, -23.56, -22.45],
[-24.54, -23.64, -22.56]])
You could feed an iterator to np.genfromtxt
:
import numpy as np
import re
with open(filename, 'r') as f:
lines = (re.sub(r'[^-+.0-9 ]+', '', line) for line in f)
arr = np.genfromtxt(lines)
print(arr)
yields
[[-24.92 -23.66 -22.55]
[-24.77 -23.56 -22.45]
[-24.54 -23.64 -22.56]]
Thanks to Bitwise for clueing me in to this answer.