I have a dataset of images, where I create the histogram of every image and then I want to store (write) them into a file, so that for every new image I use as input, I compare
You could compute the red channel histogram of all the images like this:
import os
import glob
import numpy as np
from skimage import io
root = 'C:\Users\you\imgs' # Change this appropriately
folders = ['Type_1', 'Type_2', 'Type_3']
extension = '*.bmp' # Change if necessary
def compute_red_histograms(root, folders, extension):
X = []
y = []
for n, imtype in enumerate(folders):
filenames = glob.glob(os.path.join(root, imtype, extension))
for fn in filenames:
img = io.imread(fn)
red = img[:, :, 0]
h, _ = np.histogram(red, bins=np.arange(257), normed=True)
X.append(h)
y.append(n)
return np.vstack(X), np.array(y)
X, y = compute_red_histograms(root, folders, extension)
Each image is represented through a 256-dimensional feature vector (the components of the red channel histogram), hence X
is a 2D NumPy array with as many rows as there are images in your dataset and 256 columns. y
is a 1D NumPy array with numeric class labels, i.e. 0
for Type_1
, 1
for Type_2
and 2
for Type_3
.
Next you could split your dataset into train and test like so:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
And finally, you could train a SVM classifier:
from sklearn.svm import SVC
clf = SVC()
clf.fit(X_train, y_train)
By doing so you can make predictions or assess classification accuracy very easily:
In [197]: y_test
Out[197]: array([0, 2, 0, ..., 0, 0, 1])
In [198]: clf.predict(X_test)
Out[198]: array([2, 2, 2, ..., 2, 2, 2])
In [199]: y_test == clf.predict(X_test)
Out[199]: array([False, True, False, ..., False, False, False], dtype=bool)
In [200]: clf.score(X_test, y_test)
Out[200]: 0.3125