Matrix multiplication in pandas

前端 未结 1 742
滥情空心
滥情空心 2021-02-18 22:57

I have numeric data stored in two DataFrames x and y. The inner product from numpy works but the dot product from pandas does not.

In [63]: x.shape
Out[63]: (106         


        
相关标签:
1条回答
  • 2021-02-18 23:36

    Not only must the shapes of x and y be correct, but also the column names of x must match the index names of y. Otherwise this code in pandas/core/frame.py will raise a ValueError:

    if isinstance(other, (Series, DataFrame)):
        common = self.columns.union(other.index)
        if (len(common) > len(self.columns) or
            len(common) > len(other.index)):
            raise ValueError('matrices are not aligned')
    

    If you just want to compute the matrix product without making the column names of x match the index names of y, then use the NumPy dot function:

    np.dot(x, y)
    

    The reason why the column names of x must match the index names of y is because the pandas dot method will reindex x and y so that if the column order of x and the index order of y do not naturally match, they will be made to match before the matrix product is performed:

    left = self.reindex(columns=common, copy=False)
    right = other.reindex(index=common, copy=False)
    

    The NumPy dot function does no such thing. It will just compute the matrix product based on the values in the underlying arrays.


    Here is an example which reproduces the error:

    import pandas as pd
    import numpy as np
    
    columns = ['col{}'.format(i) for i in range(36)]
    x = pd.DataFrame(np.random.random((1062, 36)), columns=columns)
    y = pd.DataFrame(np.random.random((36, 36)))
    
    print(np.dot(x, y).shape)
    # (1062, 36)
    
    print(x.dot(y).shape)
    # ValueError: matrices are not aligned
    
    0 讨论(0)
提交回复
热议问题