问题
I'm trying to create a single multi-class and multi-label net configuration in caffe.
Let's say classification of dogs: Is the dog small or large? (class) What color is it? (class) is it have a collar? (label)
Is this thing possible using caffe? What is the proper way to do so?
Just trying to understand the practical way.. After creating 2 .text files (one for training and one for validation) containing all the tags of the images, for example:
/train/img/1.png 0 4 18
/train/img/2.png 1 7 17 33
/train/img/3.png 0 4 17
Running the py script:
import h5py, os
import caffe
import numpy as np
SIZE = 227 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
sp = l.split(' ')
img = caffe.io.load_image( sp[0] )
img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
# you may apply other input transformations here...
# Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
# for example
transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
X[i] = transposed_img
y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
L.write( 'train.h5' ) # list all h5 files you are going to use
And creating train.h5 and val.h5 (is X data set containing the images and Y contain the labels?).
Replace my network input layers from:
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/train_db"
backend: LMDB
batch_size: 64
}
transform_param {
crop_size: 227
mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto"
mirror: true
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/val_db"
backend: LMDB
batch_size: 64
}
transform_param {
crop_size: 227
mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto"
mirror: true
}
include: { phase: TEST }
}
to
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TRAIN }
}
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "val_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TEST }
}
I guess HDF5 doesn't need a mean.binaryproto?
Next, how the output layer should change in order to output multiple label probabilities? I guess I need cross- entropy layer instead of softmax? This is the current output layers:
layers {
bottom: "prob"
bottom: "label"
top: "loss"
name: "loss"
type: SOFTMAX_LOSS
loss_weight: 1
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "prob"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
回答1:
Mean subtraction
While lmdb input data layer is able to handle various input transformations for you, "HDF5Data"
layer does not support this functionality.
Therefore, you must take care of all input transformations (in particular mean subtraction) when you create your hdf5 files.
See where your code says
# you may apply other input transformations here...
Multiple labels
Although your .txt lists several labels for each image, you only save the first one to hdf5 file. If you want to use these labels you have to feed them to the net.
An issue that immediately rise from your example is that you do not have a fixed number of labels for each training image -- why? what does it mean?
Assuming you have three labels for each image (in .txt files):
< filename > < dog size > < dog color > < has collar >
Then you can have y_size
, y_color
and y_collar
(instead of a single y
) in your hdf5.
y_size[i] = float(spl[1])
y_color[i] = float(spl[2])
y_collar[i] = float(spl[3])
Your input data layer will have more "top"
s accordingly:
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y_size"
top: "y_color"
top: "y_collar"
hdf5_data_param {
source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TRAIN }
}
Prediction
Currently your net only predict a single label (layer with top: "prob"
). You need your net to predict all three labels, therefore you need to add layers that compute top: "prob_size"
, top: "prob_color"
and top: "prob_collar"
(different layer for each "prob_*"
).
Once you have prediction for each label, you need a loss (again, a loss for each label).
来源:https://stackoverflow.com/questions/53047003/multi-class-and-multi-label-image-classification-using-caffe