Python numpy or pandas equivalent of the R function sweep()

前端 未结 3 2074
夕颜
夕颜 2021-01-14 01:14

What is numpy or pandas equivalent of the R function sweep()?

To elaborate: in R lets say we have a coefficient vector (say be

相关标签:
3条回答
  • 2021-01-14 01:34

    In numpy the concept is called "broadcasting". Example:

    import numpy as np
    x = np.random.random((4, 3))
    x * np.array(range(4))[:, np.newaxis] # sweep along the rows
    x + np.array(range(3))[np.newaxis, :] # sweep along the columns
    
    0 讨论(0)
  • 2021-01-14 01:42

    Does this work faster?

    t(t(data) * beta)
    

    Some other great answers here with profiling Multiply rows of matrix by vector?

    and finally to answer your query about numpy. Use this reference (search for Matrix Multiplication) http://mathesaurus.sourceforge.net/r-numpy.html

    0 讨论(0)
  • 2021-01-14 01:49

    Pandas has an apply method too, apply being what R's sweep uses under the hood. (Note that the MARGIN argument is "equivalent" to the axis argument in many pandas functions, except that it takes values 0 and 1 rather than 1 and 2).

    In [11]: np.random.seed = 1
    
    In [12]: beta = pd.Series(np.random.randn(5))
    
    In [13]: data = pd.DataFrame(np.random.randn(20, 5))
    

    You can use an apply with a function which is called against each row:

    In [14]: data.apply(lambda row: row * beta, axis=1)
    

    Note: that axis=0 would apply against each column, this is the default as data is stored column-wise and so column-wise operations are more efficient.

    However, in this case it's easy to make significantly faster (and more readable) to vectorize, simply by multiplying row-wise:

    In [21]: data.apply(lambda row: row * beta, axis=1).head()
    Out[21]:
              0         1         2         3         4
    0 -0.024827 -1.465294 -0.416155 -0.369182 -0.649587
    1  0.026433  0.355915 -0.672302  0.225446 -0.520374
    2  0.042254 -1.223200 -0.545957  0.103864 -0.372855
    3  0.086367  0.218539 -1.033671  0.218388 -0.598549
    4  0.203071 -3.402876  0.192504 -0.147548 -0.726001
    
    In [22]: data.mul(beta, axis=1).head()  # just show first few rows with head
    Out[22]:
              0         1         2         3         4
    0 -0.024827 -1.465294 -0.416155 -0.369182 -0.649587
    1  0.026433  0.355915 -0.672302  0.225446 -0.520374
    2  0.042254 -1.223200 -0.545957  0.103864 -0.372855
    3  0.086367  0.218539 -1.033671  0.218388 -0.598549
    4  0.203071 -3.402876  0.192504 -0.147548 -0.726001
    

    Note: this is slightly more robust / allows more control than using *.

    You can do the same in numpy (ie data.values here), either multiplying directly, this will be faster as it doesn't worry about data-alignment, or using vectorize rather than apply.

    0 讨论(0)
提交回复
热议问题