I have numeric data stored in two DataFrames x and y. The inner product from numpy works but the dot product from pandas does not.
In [63]: x.shape
Out[63]: (106
Not only must the shapes of x
and y
be correct, but also
the column names of x
must match the index names of y
. Otherwise
this code in pandas/core/frame.py
will raise a ValueError:
if isinstance(other, (Series, DataFrame)):
common = self.columns.union(other.index)
if (len(common) > len(self.columns) or
len(common) > len(other.index)):
raise ValueError('matrices are not aligned')
If you just want to compute the matrix product without making the column names of x
match the index names of y
, then use the NumPy dot function:
np.dot(x, y)
The reason why the column names of x
must match the index names of y
is because the pandas dot
method will reindex x
and y
so that if the column order of x
and the index order of y
do not naturally match, they will be made to match before the matrix product is performed:
left = self.reindex(columns=common, copy=False)
right = other.reindex(index=common, copy=False)
The NumPy dot
function does no such thing. It will just compute the matrix product based on the values in the underlying arrays.
Here is an example which reproduces the error:
import pandas as pd
import numpy as np
columns = ['col{}'.format(i) for i in range(36)]
x = pd.DataFrame(np.random.random((1062, 36)), columns=columns)
y = pd.DataFrame(np.random.random((36, 36)))
print(np.dot(x, y).shape)
# (1062, 36)
print(x.dot(y).shape)
# ValueError: matrices are not aligned