I have a numpy array where each cell of a specific row represents a value for a feature. I store all of them in an 100*4 matrix.
A B C
1000 10 0.5
76
You can use sklearn.preprocessing:
from sklearn.preprocessing import normalize
data = np.array([
[1000, 10, 0.5],
[765, 5, 0.35],
[800, 7, 0.09], ])
data = normalize(data, axis=0, norm='max')
print(data)
>>[[ 1. 1. 1. ]
[ 0.765 0.5 0.7 ]
[ 0.8 0.7 0.18 ]]
If I understand correctly, what you want to do is divide by the maximum value in each column. You can do this easily using broadcasting.
Starting with your example array:
import numpy as np
x = np.array([[1000, 10, 0.5],
[ 765, 5, 0.35],
[ 800, 7, 0.09]])
x_normed = x / x.max(axis=0)
print(x_normed)
# [[ 1. 1. 1. ]
# [ 0.765 0.5 0.7 ]
# [ 0.8 0.7 0.18 ]]
x.max(0)
takes the maximum over the 0th dimension (i.e. rows). This gives you a vector of size (ncols,)
containing the maximum value in each column. You can then divide x
by this vector in order to normalize your values such that the maximum value in each column will be scaled to 1.
If x
contains negative values you would need to subtract the minimum first:
x_normed = (x - x.min(0)) / x.ptp(0)
Here, x.ptp(0)
returns the "peak-to-peak" (i.e. the range, max - min) along axis 0. This normalization also guarantees that the minimum value in each column will be 0.