Multi-Label Image Classification

陌路散爱 提交于 2020-12-15 02:01:12

问题


I tried myself but couldn't reach the final point that's why posting here, please guide me.

  • I am working in multi-label image classification and have slightly different scenarios. Actually I am confused, how we will map labels and their attribute with Id etc So we can use for training and testing.
  • Here is code on which I am working

    import os
    import numpy as np
    import pandas as pd
    from keras.utils import to_categorical
    from collections import Counter
    from keras.callbacks import Callback
    from keras.preprocessing.image import load_img
    from keras.preprocessing.image import img_to_array
    from sklearn.model_selection import train_test_split
    
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, Flatten
    from tensorflow.keras.layers import Conv2D, MaxPooling2D
    from matplotlib import pyplot
    from tensorflow.keras import backend
    
    def create_tag_mapping(mapping_csv):
        labels = set()
        for i in range(len(mapping_csv)):
            tags = mapping_csv['Labels'][i].split(' ')
            labels.update(tags)
        labels = list(labels)
        labels.sort()
        labels_map = {labels[i]:i for i in range(len(labels))}
        inv_labels_map = {i:labels[i] for i in range(len(labels))}
        return labels_map, inv_labels_map
    
    # create a mapping of filename to tags
    def create_file_mapping(mapping_csv):
        mapping = dict()
        for i in range(len(mapping_csv)):
            name, tags = mapping_csv['Id'][i], mapping_csv['Labels'][i]
            mapping[name] = tags.split(' ')
        return mapping
    
    # create a one hot encoding for one list of tags
    def one_hot_encode(tags, mapping):
        # create empty vector
        encoding = np.zeros(len(mapping), dtype='uint8')
        # mark 1 for each tag in the vector
        for tag in tags:
            encoding[mapping[tag]] = 1
        return encoding
    
    def load_dataset(path, file_mapping, tag_mapping):
        photos, targets = list(), list()
        # enumerate files in the directory
        for filename in os.listdir(path):
            # load image
            photo = load_img(path + filename, target_size=(760,415))
            # convert to numpy array
            photo = img_to_array(photo, dtype='uint8')
            # get tags
            tags = file_mapping[filename[:-4]]
            # one hot encode tags
            target = one_hot_encode(tags, tag_mapping)
            # store
            photos.append(photo)
            targets.append(target)
        X = np.asarray(photos, dtype='uint8')
        y = np.asarray(targets, dtype='uint8')
        return X, y
    
    trainingLabels = 'labels.csv'
    # load the mapping file
    mapping_csv = pd.read_csv(trainingLabels)
    
    
    # create a mapping of tags to integers
    tag_mapping, _ = create_tag_mapping(mapping_csv)
    
    # create a mapping of filenames to tag lists
    file_mapping = create_file_mapping(mapping_csv)
    
    
    # load the png images
    folder = 'dataset/'
    
    X, y = load_dataset(folder, file_mapping, tag_mapping)
    print(X.shape, y.shape)
    
    trainX, testX, trainY, testY = train_test_split(X, y, test_size=0.3, random_state=1)
    print(trainX.shape, trainY.shape, testX.shape, testY.shape)
    
    img_x,img_y=760,415
    trainX=trainX.reshape(trainX.shape[0], img_x,img_y,3)
    testX=testX.reshape(testX.shape[0], img_x,img_y,3)
    
    trainX=trainX.astype('float32')
    testX=testX.astype('float32')
    
    trainX /= 255
    testX /=255
    
    trainY=to_categorical(trainY,3)
    testY=to_categorical(testY,3)
    print(trainX.shape)
    print(trainY.shape)
    
    model = Sequential()
    model.add(Conv2D(32, (5, 5), strides=(1,1), activation='relu', input_shape=(img_x, img_y,3)))
    model.add(MaxPooling2D((2, 2), strides=(2,2)))
    model.add(Flatten())
    model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
    model.add(Dense(3, activation='sigmoid'))
    
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    history=model.fit(trainX, trainY, batch_size=2, epochs=5, verbose=1)
    plt.plot(history.history['acc'])
    plt.plot(history.history['loss'])
    plt.title('Accuracy and loss')
    plt.xlabel('epoch')
    plt.ylabel('accuracy/loss')
    plt.legend(['Accuracy','loss'],loc='upper left')
    plt.show()
    
    score=model.evaluate(testX,testY,verbose=0)
    print('test loss',score[0])
    print('test accuracy',score[1])
    

I have attached an image fileExplaination, that will give a clear picture of my problem.

Because If we followed these

  1. https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-satellite-photos-of-the-amazon-rainforest/
  2. https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff
  3. https://www.analyticsvidhya.com/blog/2019/04/predicting-movie-genres-nlp-multi-label-classification/

etc. They have multi labels against each image but in my case, I have multilabel plus their attributes.


回答1:


If your goal is to predict if 'L', 'M' and 'H', you are using an incorrect loss function. You should use binary_crossentropy. The shape of your targets will be batch × 3 in this case.

  • categorical_crossentropy assumes the output is a categorical distribution: a vector of values that sum up to one. In other words, you have multiple possibilities, but only of them can be the correct one.

  • binary_crossentropy assumes that every number from the output vector is a (conditionally) independent binary distribution, so each number is between 0 and 1, but they do not necessarily sum up to one, because it can very well happen that all of them true.

If your goal is to predict for each label1, ..., label6 the value, then you should model a categorical distribution for each of the labels. You have six labels, each of them has 3 values, you thus need 18 numbers (logits). The shape of your targets will be batch × 6 × 3 in this case.

model.add(Dense(18, activation='none'))

Because you don't want a single distribution over 18 values, but over 6 × 3 values, you need to reshape the logits first:

model.add(Reshape((6, 3))
model.add(Softmax())



回答2:


Base on the above discussion. Here is the solution for the above problem. As I mentioned we have a total of 5 labels and each label have further three tags like (L, M, H) We can perform encoding in this way

# create a one hot encoding for one list of tags
def custom_encode(tags, mapping):
    # create empty vector
    encoding=[]
    for tag in tags:
        if tag == 'L':
            encoding.append([1,0,0])
        elif tag == 'M':
            encoding.append([0,1,0])
        else:
            encoding.append([0,0,1])
    return encoding

So encoded y-vector will look like

**Labels     Tags             Encoded Tags** 
Label1 ----> [L,L,L,M,H] ---> [ [1,0,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]
Label2 ----> [L,H,L,M,H] ---> [ [1,0,0], [0,0,1], [1,0,0], [0,1,0], [0,0,1] ]
Label3 ----> [L,M,L,M,H] ---> [ [1,0,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label4 ----> [M,M,L,M,H] ---> [ [0,1,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label5 ----> [M,L,L,M,H] ---> [ [0,1,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]


The final layer will be like

 model.add(Dense(15)) #because we have total 5 labels and each has 3 tags so 15 neurons will be on final layer
 model.add(Reshape((5,3))) # each 5 have further 3 tags we need to reshape it
 model.add(Activation('softmax'))


来源:https://stackoverflow.com/questions/58813194/multi-label-image-classification

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!