I tried myself but couldn't reach the final point that's why posting here, please guide me.
- I am working in multi-label image classification and have slightly different scenarios. Actually I am confused, how we will map labels and their attribute with Id etc So we can use for training and testing.
Here is code on which I am working
import os import numpy as np import pandas as pd from keras.utils import to_categorical from collections import Counter from keras.callbacks import Callback from keras.preprocessing.image import load_img from keras.preprocessing.image import img_to_array from sklearn.model_selection import train_test_split from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.layers import Conv2D, MaxPooling2D from matplotlib import pyplot from tensorflow.keras import backend def create_tag_mapping(mapping_csv): labels = set() for i in range(len(mapping_csv)): tags = mapping_csv['Labels'][i].split(' ') labels.update(tags) labels = list(labels) labels.sort() labels_map = {labels[i]:i for i in range(len(labels))} inv_labels_map = {i:labels[i] for i in range(len(labels))} return labels_map, inv_labels_map # create a mapping of filename to tags def create_file_mapping(mapping_csv): mapping = dict() for i in range(len(mapping_csv)): name, tags = mapping_csv['Id'][i], mapping_csv['Labels'][i] mapping[name] = tags.split(' ') return mapping # create a one hot encoding for one list of tags def one_hot_encode(tags, mapping): # create empty vector encoding = np.zeros(len(mapping), dtype='uint8') # mark 1 for each tag in the vector for tag in tags: encoding[mapping[tag]] = 1 return encoding def load_dataset(path, file_mapping, tag_mapping): photos, targets = list(), list() # enumerate files in the directory for filename in os.listdir(path): # load image photo = load_img(path + filename, target_size=(760,415)) # convert to numpy array photo = img_to_array(photo, dtype='uint8') # get tags tags = file_mapping[filename[:-4]] # one hot encode tags target = one_hot_encode(tags, tag_mapping) # store photos.append(photo) targets.append(target) X = np.asarray(photos, dtype='uint8') y = np.asarray(targets, dtype='uint8') return X, y trainingLabels = 'labels.csv' # load the mapping file mapping_csv = pd.read_csv(trainingLabels) # create a mapping of tags to integers tag_mapping, _ = create_tag_mapping(mapping_csv) # create a mapping of filenames to tag lists file_mapping = create_file_mapping(mapping_csv) # load the png images folder = 'dataset/' X, y = load_dataset(folder, file_mapping, tag_mapping) print(X.shape, y.shape) trainX, testX, trainY, testY = train_test_split(X, y, test_size=0.3, random_state=1) print(trainX.shape, trainY.shape, testX.shape, testY.shape) img_x,img_y=760,415 trainX=trainX.reshape(trainX.shape[0], img_x,img_y,3) testX=testX.reshape(testX.shape[0], img_x,img_y,3) trainX=trainX.astype('float32') testX=testX.astype('float32') trainX /= 255 testX /=255 trainY=to_categorical(trainY,3) testY=to_categorical(testY,3) print(trainX.shape) print(trainY.shape) model = Sequential() model.add(Conv2D(32, (5, 5), strides=(1,1), activation='relu', input_shape=(img_x, img_y,3))) model.add(MaxPooling2D((2, 2), strides=(2,2))) model.add(Flatten()) model.add(Dense(128, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(3, activation='sigmoid')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) history=model.fit(trainX, trainY, batch_size=2, epochs=5, verbose=1) plt.plot(history.history['acc']) plt.plot(history.history['loss']) plt.title('Accuracy and loss') plt.xlabel('epoch') plt.ylabel('accuracy/loss') plt.legend(['Accuracy','loss'],loc='upper left') plt.show() score=model.evaluate(testX,testY,verbose=0) print('test loss',score[0]) print('test accuracy',score[1])
I have attached an image file, that will give a clear picture of my problem.
Because If we followed these
- https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-satellite-photos-of-the-amazon-rainforest/
- https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff
- https://www.analyticsvidhya.com/blog/2019/04/predicting-movie-genres-nlp-multi-label-classification/
etc. They have multi labels against each image but in my case, I have multilabel plus their attributes.
If your goal is to predict if 'L', 'M' and 'H', you are using an incorrect loss function. You should use binary_crossentropy
. The shape of your targets will be batch × 3 in this case.
categorical_crossentropy assumes the output is a categorical distribution: a vector of values that sum up to one. In other words, you have multiple possibilities, but only of them can be the correct one.
binary_crossentropy assumes that every number from the output vector is a (conditionally) independent binary distribution, so each number is between 0 and 1, but they do not necessarily sum up to one, because it can very well happen that all of them true.
If your goal is to predict for each label1, ..., label6 the value, then you should model a categorical distribution for each of the labels. You have six labels, each of them has 3 values, you thus need 18 numbers (logits). The shape of your targets will be batch × 6 × 3 in this case.
model.add(Dense(18, activation='none'))
Because you don't want a single distribution over 18 values, but over 6 × 3 values, you need to reshape the logits first:
model.add(Reshape((6, 3))
Base on the above discussion. Here is the solution for the above problem. As I mentioned we have a total of 5 labels and each label have further three tags like (L, M, H) We can perform encoding in this way
# create a one hot encoding for one list of tags
def custom_encode(tags, mapping):
# create empty vector
for tag in tags:
if tag == 'L':
elif tag == 'M':
return encoding
So encoded y-vector will look like
**Labels Tags Encoded Tags**
Label1 ----> [L,L,L,M,H] ---> [ [1,0,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]
Label2 ----> [L,H,L,M,H] ---> [ [1,0,0], [0,0,1], [1,0,0], [0,1,0], [0,0,1] ]
Label3 ----> [L,M,L,M,H] ---> [ [1,0,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label4 ----> [M,M,L,M,H] ---> [ [0,1,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label5 ----> [M,L,L,M,H] ---> [ [0,1,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]
The final layer will be like
model.add(Dense(15)) #because we have total 5 labels and each has 3 tags so 15 neurons will be on final layer
model.add(Reshape((5,3))) # each 5 have further 3 tags we need to reshape it