How do I standardize a matrix?

后端 未结 5 710
误落风尘
误落风尘 2021-01-03 23:25

Basically, take a matrix and change it so that its mean is equal to 0 and variance is 1. I\'m using numpy\'s arrays so if it can already do it it\'s better, but I can implem

相关标签:
5条回答
  • 2021-01-03 23:38
    from sklearn.preprocessing import StandardScaler
    
    standardized_data = StandardScaler().fit_transform(your_data)
    

    Example:

    >>> import numpy as np
    >>> from sklearn.preprocessing import StandardScaler
    
    >>> data = np.random.randint(25, size=(4, 4))
    >>> data
    array([[17, 12,  4, 17],
           [ 1, 16, 19,  1],
           [ 7,  8, 10,  4],
           [22,  4,  2,  8]])
    
    >>> standardized_data = StandardScaler().fit_transform(data)
    >>> standardized_data
    array([[ 0.63812398,  0.4472136 , -0.718646  ,  1.57786412],
           [-1.30663482,  1.34164079,  1.55076242, -1.07959124],
           [-0.57735027, -0.4472136 ,  0.18911737, -0.58131836],
           [ 1.24586111, -1.34164079, -1.02123379,  0.08304548]])
    

    Works well on large datasets.

    0 讨论(0)
  • 2021-01-03 23:43
    import scipy.stats as ss
    
    A = np.array(ss.zscore(A))
    
    0 讨论(0)
  • 2021-01-03 23:47

    Use sklearn.preprocessing.scale.

    http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html

    Here is an example.

    >>> from sklearn import preprocessing
    >>> import numpy as np
    >>> X_train = np.array([[ 1., -1.,  2.],
    ...                     [ 2.,  0.,  0.],
    ...                     [ 0.,  1., -1.]])
    >>> X_scaled = preprocessing.scale(X_train)
    >>> X_scaled
    array([[ 0.  ..., -1.22...,  1.33...],
           [ 1.22...,  0.  ..., -0.26...],
           [-1.22...,  1.22..., -1.06...]])
    

    http://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling

    0 讨论(0)
  • 2021-01-03 23:50
    import numpy as np
    
    A = np.array([[1,2,6], [3000,1000,2000]]).T  
    
    A_means = np.mean(A, axis=0)
    A_centr = A - A_means
    A_norms = np.linalg.norm(A_centr, axis=0)
    
    A_std = A_centr / A_norms
    
    0 讨论(0)
  • 2021-01-03 23:51

    The following subtracts the mean of A from each element (the new mean is 0), then normalizes the result by the standard deviation.

    import numpy as np
    A = (A - np.mean(A)) / np.std(A)
    

    The above is for standardizing the entire matrix as a whole, If A has many dimensions and you want to standardize each column individually, specify the axis:

    import numpy as np
    A = (A - np.mean(A, axis=0)) / np.std(A, axis=0)
    

    Always verify by hand what these one-liners are doing before integrating them into your code. A simple change in orientation or dimension can drastically change (silently) what operations numpy performs on them.

    0 讨论(0)
提交回复
热议问题