My current code is shown below - I\'m importing a MAT file and trying to create a DataFrame from variables within it:
mat = loadmat(file_path) # load mat-fi
Here is how to create a DataFrame where each series is a row.
For a single Series (resulting in a single-row DataFrame):
series = pd.Series([1,2], index=['a','b'])
df = pd.DataFrame([series])
For multiple series with identical indices:
cols = ['a','b']
list_of_series = [pd.Series([1,2],index=cols), pd.Series([3,4],index=cols)]
df = pd.DataFrame(list_of_series, columns=cols)
For multiple series with possibly different indices:
list_of_series = [pd.Series([1,2],index=['a','b']), pd.Series([3,4],index=['a','c'])]
df = pd.concat(list_of_series, axis=1).transpose()
To create a DataFrame where each series is a column, see the answers by others. Alternatively, one can create a DataFrame where each series is a row, as above, and then use df.transpose()
. However, the latter approach is inefficient if the columns have different data types.
I guess anther way, possibly faster, to achieve this is
1) Use dict comprehension to get desired dict (i.e., taking 2nd col of each array)
2) Then use pd.DataFrame
to create an instance directly from the dict without loop over each col and concat.
Assuming your mat
looks like this (you can ignore this since your mat
is loaded from file):
In [135]: mat = {'a': np.random.randint(5, size=(4,2)),
.....: 'b': np.random.randint(5, size=(4,2))}
In [136]: mat
Out[136]:
{'a': array([[2, 0],
[3, 4],
[0, 1],
[4, 2]]), 'b': array([[1, 0],
[1, 1],
[1, 0],
[2, 1]])}
Then you can do:
In [137]: df = pd.DataFrame ({name:mat[name][:,1] for name in mat})
In [138]: df
Out[138]:
a b
0 0 0
1 4 1
2 1 0
3 2 1
[4 rows x 2 columns]
No need to initialize an empty DataFrame (you weren't even doing that, you'd need pd.DataFrame()
with the parens).
Instead, to create a DataFrame where each series is a column,
series
, and df = pd.concat(series, axis=1)
Something like:
series = [pd.Series(mat[name][:, 1]) for name in Variables]
df = pd.concat(series, axis=1)