Function application over numpy's matrix row/column

后端 未结 4 1265
夕颜
夕颜 2020-12-02 13:36

I am using Numpy to store data into matrices. Coming from R background, there has been an extremely simple way to apply a function over row/columns or both of a matrix.

相关标签:
4条回答
  • 2020-12-02 13:38

    I also come from a more R background, and bumped into the lack of a more versatile apply which could take short customized functions. I've seen the forums suggesting using basic numpy functions because many of them handle arrays. However, I've been getting confused over the way "native" numpy functions handle array (sometimes 0 is row-wise and 1 column-wise, sometimes the opposite).

    My personal solution to more flexible functions with apply_along_axis was to combine them with the implicit lambda functions available in python. Lambda functions should very easy to understand for the R minded who uses a more functional programming style, like in R functions apply, sapply, lapply, etc.

    So for example I wanted to apply standardisation of variables in a matrix. Tipically in R there's a function for this (scale) but you can also build it easily with apply:

    (R code)

    apply(Mat,2,function(x) (x-mean(x))/sd(x) ) 
    

    You see how the body of the function inside apply (x-mean(x))/sd(x) is the bit we can't type directly for the python apply_along_axis. With lambda this is easy to implement FOR ONE SET OF VALUES, so:

    (Python)

    import numpy as np
    vec=np.random.randint(1,10,10)  # some random data vector of integers
    
    (lambda x: (x-np.mean(x))/np.std(x)  )(vec)
    

    Then, all we need is to plug this inside the python apply and pass the array of interest through apply_along_axis

    Mat=np.random.randint(1,10,3*4).reshape((3,4))  # some random data vector
    np.apply_along_axis(lambda x: (x-np.mean(x))/np.std(x),0,Mat )
    

    Obviously, the lambda function could be implemented as a separate function, but I guess the whole point is to use rather small functions contained within the line where apply originated.

    I hope you find it useful !

    0 讨论(0)
  • 2020-12-02 13:44

    Selecting elements from a NumPy array based on one or more conditions is straightforward using NumPy's beautifully dense syntax:

    >>> import numpy as NP
    >>> # generate a matrix to demo the code
    >>> A = NP.random.randint(0, 10, 40).reshape(8, 5)
    >>> A
      array([[6, 7, 6, 4, 8],
             [7, 3, 7, 9, 9],
             [4, 2, 5, 9, 8],
             [3, 8, 2, 6, 3],
             [2, 1, 8, 0, 0],
             [8, 3, 9, 4, 8],
             [3, 3, 9, 8, 4],
             [5, 4, 8, 3, 0]])
    


    how many elements in column 2 are greater than 6?

    >>> ndx = A[:,1] > 6
    >>> ndx
          array([False,  True, False, False,  True,  True,  True,  True], dtype=bool)
    >>> NP.sum(ndx)
          5
    


    how many elements in last column of A have absolute value larger than 3?

    >>> A = NP.random.randint(-4, 4, 40).reshape(8, 5)
    >>> A
      array([[-4, -1,  2,  0,  3],
             [-4, -1, -1, -1,  1],
             [-1, -2,  2, -2,  3],
             [ 1, -4, -1,  0,  0],
             [-4,  3, -3,  3, -1],
             [ 3,  0, -4, -1, -3],
             [ 3, -4,  0, -3, -2],
             [ 3, -4, -4, -4,  1]])
    
    >>> ndx = NP.abs(A[:,-1]) > 3
    >>> NP.sum(ndx)
          0
    


    how many elements in the first two rows of A are greater than or equal to 2?

    >>> ndx = A[:2,:] >= 2
    >>> NP.sum(ndx.ravel())    # 'ravel' just flattens ndx, which is originally 2D (2x5)
          2
    

    NumPy's indexing syntax is pretty close to R's; given your fluency in R, here are the key differences between R and NumPy in this context:

    NumPy indices are zero-based, in R, indexing begins with 1

    NumPy (like Python) allows you to index from right to left using negative indices--e.g.,

    # to get the last column in A
    A[:, -1], 
    
    # to get the penultimate column in A
    A[:, -2] 
    
    # this is a big deal, because in R, the equivalent expresson is:
    A[, dim(A)[0]-2]
    

    NumPy uses colon ":" notation to denote "unsliced", e.g., in R, to get the first three rows in A, you would use, A[1:3, ]. In NumPy, you would use A[0:2, :] (in NumPy, the "0" is not necessary, in fact it is preferable to use A[:2, :]

    0 讨论(0)
  • 2020-12-02 14:04

    Almost all numpy functions operate on whole arrays, and/or can be told to operate on a particular axis (row or column).

    As long as you can define your function in terms of numpy functions acting on numpy arrays or array slices, your function will automatically operate on whole arrays, rows or columns.

    It may be more helpful to ask about how to implement a particular function to get more concrete advice.


    Numpy provides np.vectorize and np.frompyfunc to turn Python functions which operate on numbers into functions that operate on numpy arrays.

    For example,

    def myfunc(a,b):
        if (a>b): return a
        else: return b
    vecfunc = np.vectorize(myfunc)
    result=vecfunc([[1,2,3],[5,6,9]],[7,4,5])
    print(result)
    # [[7 4 5]
    #  [7 6 9]]
    

    (The elements of the first array get replaced by the corresponding element of the second array when the second is bigger.)

    But don't get too excited; np.vectorize and np.frompyfunc are just syntactic sugar. They don't actually make your code any faster. If your underlying Python function is operating on one value at a time, then np.vectorize will feed it one item at a time, and the whole operation is going to be pretty slow (compared to using a numpy function which calls some underlying C or Fortran implementation).


    To count how many elements of column x are smaller than a number y, you could use an expression such as:

    (array['x']<y).sum()
    

    For example:

    import numpy as np
    array=np.arange(6).view([('x',np.int),('y',np.int)])
    print(array)
    # [(0, 1) (2, 3) (4, 5)]
    
    print(array['x'])
    # [0 2 4]
    
    print(array['x']<3)
    # [ True  True False]
    
    print((array['x']<3).sum())
    # 2
    
    0 讨论(0)
  • 2020-12-02 14:05

    Pandas is very useful for this. For instance, DataFrame.apply() and groupby's apply() should help you.

    0 讨论(0)
提交回复
热议问题