Normalize numpy array columns in python

后端 未结 2 1064
旧时难觅i
旧时难觅i 2020-12-04 19:01

I have a numpy array where each cell of a specific row represents a value for a feature. I store all of them in an 100*4 matrix.

A     B   C
1000  10  0.5
76         


        
相关标签:
2条回答
  • 2020-12-04 19:45

    You can use sklearn.preprocessing:

    from sklearn.preprocessing import normalize
    data = np.array([
        [1000, 10, 0.5],
        [765, 5, 0.35],
        [800, 7, 0.09], ])
    data = normalize(data, axis=0, norm='max')
    print(data)
    >>[[ 1.     1.     1.   ]
    [ 0.765  0.5    0.7  ]
    [ 0.8    0.7    0.18 ]]
    
    0 讨论(0)
  • 2020-12-04 19:56

    If I understand correctly, what you want to do is divide by the maximum value in each column. You can do this easily using broadcasting.

    Starting with your example array:

    import numpy as np
    
    x = np.array([[1000,  10,   0.5],
                  [ 765,   5,  0.35],
                  [ 800,   7,  0.09]])
    
    x_normed = x / x.max(axis=0)
    
    print(x_normed)
    # [[ 1.     1.     1.   ]
    #  [ 0.765  0.5    0.7  ]
    #  [ 0.8    0.7    0.18 ]]
    

    x.max(0) takes the maximum over the 0th dimension (i.e. rows). This gives you a vector of size (ncols,) containing the maximum value in each column. You can then divide x by this vector in order to normalize your values such that the maximum value in each column will be scaled to 1.


    If x contains negative values you would need to subtract the minimum first:

    x_normed = (x - x.min(0)) / x.ptp(0)
    

    Here, x.ptp(0) returns the "peak-to-peak" (i.e. the range, max - min) along axis 0. This normalization also guarantees that the minimum value in each column will be 0.

    0 讨论(0)
提交回复
热议问题