问题
I am using Tensorflow as a backend to Keras and I am trying to understand how to bring in my labels for image segmentation training.
I am using the LFW Parts Dataset which has both the ground truth image and the ground truth mask which looks like this * 1500 training images:
As I understand the process, during training, I load both the
- (X) Image
- (Y) Mask Image
Doing this in batches to meet my needs. Now my question is, is it sufficient to just load them both (Image and Mask Image) as NumPy arrays (N, N, 3) or do I need to process/reshape the Mask image in some way. Effectively, the mask/labels are represented as [R, G, B] pixels where:
- [255, 0, 0] Hair
- [0, 255, 0] Face
- [0, 0, 255] Background
I could do something like this to normalize it to 0-1, I don't know if I should though:
im = Image.open(path)
label = np.array(im, dtype=np.uint8)
label = np.multiply(label, 1.0/255)
so I end up with:
- [1, 0, 0] Hair
- [0, 1, 0] Face
- [0, 0, 1] Background
Everything I found online uses existing datasets in tensorflow or keras. Nothing is really all that clear on how to pull things off if you have what could be a considered a custom dataset.
I found this related to Caffe: https://groups.google.com/forum/#!topic/caffe-users/9qNggEa8EaQ
And they advocate for converting the mask images to a (H, W, 1)
(HWC) ?where my classes would be 0, 1 ,2
for Background, Hair, and Face respectively.
It may be that this is a duplicate here (combination of similar quesiton/answers):
How to implement multi-class semantic segmentation?
Tensorflow: How to create a Pascal VOC style image
I found one example that processes PascalVOC into (N, N, 1) that I adapted:
LFW_PARTS_PALETTE = {
(0, 0, 255) : 0 , # background (blue)
(255, 0, 0) : 1 , # hair (red)
(0, 0, 255) : 2 , # face (green)
}
def convert_from_color_segmentation(arr_3d):
arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
palette = LFW_PARTS_PALETTE
for i in range(0, arr_3d.shape[0]):
for j in range(0, arr_3d.shape[1]):
key = (arr_3d[i, j, 0], arr_3d[i, j, 1], arr_3d[i, j, 2])
arr_2d[i, j] = palette.get(key, 0) # default value if key was not found is 0
return arr_2d
I think this might be close to what I want but not spot on. I think I need it to be (N, N, 3) since I have 3 classes? The above version and there is another one originated from these 2 locations:
https://github.com/martinkersner/train-CRF-RNN/blob/master/utils.py#L50
https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/ce75c97fc1337a676e32214ba74865e55adc362c/deeplab_resnet/utils.py#L41 (this link one-hot's the values)
回答1:
Since this is semantic segmentation, you are classifying each pixel in the image, so you would be using a cross-entropy loss most likely. Keras, as well as TensorFlow require that your mask is one hot encoded, and also, the output dimension of your mask should be something like [batch, height, width, num_classes] <- which you will have to reshape the same way as your mask before computing your cross-entropy mask, which essentially means that you would have to reshape your logits and mask to the tensor shape [-1, num_classes] where -1 denotes 'as many as required'.
Have a look here at the end
Since your question is about loading your own image, I just finished building an input pipeline for segmentation myself, it is in TensorFlow though, so I don't know if it helps you, have a look if you are interested: Tensorflow input pipeline for segmentation
回答2:
Keras requires the label to be one-hot encoded. So your input will have to be of (N x N x n_classes) dimension.
来源:https://stackoverflow.com/questions/45178513/how-to-load-image-masks-labels-for-image-segmentation-in-keras