Mask R-CNN for object detection and segmentation [Train for a custom dataset]

问题

I'm doing a research on "Mask R-CNN for Object Detection and Segmentation". So I have read the original research paper which presents Mask R-CNN for object detection, and also I found few implementations of Mask R-CNN, here and here (by Facebook AI research team called detectron). But they all have used coco datasets for testing.

But I'm quite a bit of confusing for training above implementations with custom data-set which has a large set of images and for each image there is a subset of masks images for marking the objects in the corresponding image.

So I'm pleasure if anyone can post useful resources or code samples for this task.

Note: My dataset has following structure,

It consists with a large number of images and for each image, there are separate image files highlighting the object as a white patch in a black image.

Here is an example image and it's masks:

Image;

Masks;

回答1:

I have trained https://github.com/matterport/Mask_RCNN 's model for instance segmentation to run on my dataset.

My assumption is that you have all the basic setup done and the model is already running with default dataset(provided in the repo) and now you want it to run for custom dataset.

Following are the steps

You need to have all your annotations.
All of those need to be converted to VGG Polygon schema (yes i mean polygons, even if you need bound boxes). I have added a sample VGG Polygon format at the end of this answer.
You need to divide your custom dataset into train, test and val
The annotation by default are looked with a filename via_region_data.json inside the individual dataset folder. For eg for training images it would look at train\via_region_data.json. You can also change it if you want.
Inside Samples folder you can find folders like Balloon, Nucleus, Shapes etc. Copy one of the folders. Preferably balloon. We will now try to modify this new folder for our custom dataset.
Inside the copied folder, you will have a .py file (for balloon it will be balloon.py), change the following variables
- ROOT_DIR : the absolute path where you have cloned the project
- DEFAULT_LOGS_DIR : This folder will get bigger in size so change this path accordingly (if you are running your code in a low disk storage VM). It will store the .h5 file as well. It will make subfolder inside the log folder with timestamp attached to it.
- .h5 files are roughly 200 - 300 MB per epoch. But guess what this log directory is Tensorboard compatible. You can pass the timestamped subfolder as --logdir argument while running tensorboard.
This .py file also has two classes - one class with suffix as Config and another class with suffix as Dataset.
In Config class override the required stuff like
- NAME : a name for your project.
- NUM_CLASSES : it should be one more than your label class because background is also considered as one label
- DETECTION_MIN_CONFIDENCE : by default 0.9 (decrease it if your training images are not of very high quality or you don't have much training data)
- STEPS_PER_EPOCH etc
In Dataset class override the following methods. All these functions are already well-commented so you can follow the comments to override according to your needs.
- load_(name_of_the_sample_project) for eg load_balloon
- load_mask (see the last of answer for a sample)
- image_reference
train function (outside Dataset class) : if you have to change the number of epochs or learning rate etc

You can now run it directly from terminal

python samples\your_folder_name\your_python_file_name.py train --dataset="location_of_custom_dataset" --weights=coco

For complete information of the command line arguments for the above line you can see it as a comment at the top of this .py file.

These are the things which I could recall, I would like to add more steps as I remember. Maybe you can let me know if you are stuck at any particular step, I will elaborate that particular step.

VGG Polygon Schema

Width and Height are optional

[{
    "filename": "000dfce9-f14c-4a25-89b6-226316f557f3.jpeg",
    "regions": {
        "0": {
            "region_attributes": {
                "object_name": "Cat"
            },
            "shape_attributes": {
                "all_points_x": [75.30864197530865, 80.0925925925926, 80.0925925925926, 75.30864197530865],
                "all_points_y": [11.672189112257607, 11.672189112257607, 17.72093488703078, 17.72093488703078],
                "name": "polygon"
            }
        },
        "1": {
            "region_attributes": {
                "object_name": "Cat"
            },
            "shape_attributes": {
                "all_points_x": [80.40123456790124, 84.64506172839506, 84.64506172839506, 80.40123456790124],
                "all_points_y": [8.114103362391036, 8.114103362391036, 12.205901974737595, 12.205901974737595],
                "name": "polygon"
            }
        }
    },
    "width": 504,
    "height": 495
}]

Sample load_mask function

def load_mask(self, image_id):
    """Generate instance masks for an image.
    Returns:
    masks: A bool array of shape [height, width, instance count] with
        one mask per instance.
    class_ids: a 1D array of class IDs of the instance masks.
    """
    # If not your dataset image, delegate to parent class.
    image_info = self.image_info[image_id]
    if image_info["source"] != "name_of_your_project":   //change your project name
        return super(self.__class__, self).load_mask(image_id)

    # Convert polygons to a bitmap mask of shape
    # [height, width, instance_count]
    info = self.image_info[image_id]
    mask = np.zeros([info["height"], info["width"], len(info["polygons"])], dtype=np.uint8)
    class_id =  np.zeros([mask.shape[-1]], dtype=np.int32)

    for i, p in enumerate(info["polygons"]):
        # Get indexes of pixels inside the polygon and set them to 1
        rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
        # print(rr.shape, cc.shape, i, np.ones([mask.shape[-1]], dtype=np.int32).shape, info['classes'][i])

        class_id[i] = self.class_dict[info['classes'][i]]
        mask[rr, cc, i] = 1


    # Return mask, and array of class IDs of each instance. Since we have
    # one class ID only, we return an array of 1s
    return mask.astype(np.bool), class_id

回答2:

So first of all, you need to extract the bounding boxes of each image. That task has to be manually, or you can use tools like OpenCV

edit for open cv

Again for the segments in white, you have to do the best technique with any tool of your choice, I'd do it with OpenCV. The code may be really specific because it can be approached with different techniques. There's no other way because you've got no annotation but masks.

Now you've got your image and your box in a format (x, y, width, height).

Detectron has a JSON file format such as: https://pastebin.com/ewaaC5Bm

Now, you can create a JSON like that with the images value, because you've got that information.

Since we don't have any segmentation (in your example), let's clarify the parameters that annotations are taking:

category_id: This is the id of the category. You can see in the pastebin that the only category I showed had id = 32. You'd need to add more categories according to your dataset.
bbox: This is the box we talked about above: [x, y, width, height]

Now for iscrowd, area and segmentation we can apparently take two approaches: this or this.

That way segmentation won't be considered (or will be considered but ignored).

Good luck.

回答3:

For the image segmentation task, there are two ways to provide mask images to the training code.

A mask image for the whole image.
A mask image for each object in the image.

In Mask R-CNN, you have to follow 2.

Our Mac OS X app RectLabel can export both of mask images.

An index color image which color table corresponds to the object class id.
Gray image for each object which consists of 0:background and 255:foreground.

We provide python code examples of how to load mask images and set to the TFRecord file for the Mask R-CNN code.

COCO JSON file to TFRecord with mask images

https://github.com/ryouchinsa/Rectlabel-support/blob/master/rectlabel_create_coco_tf_record.py

python object_detection/dataset_tools/rectlabel_create_coco_tf_record.py \
--train_image_dir="${TRAIN_IMAGE_DIR}" \
--val_image_dir="${VAL_IMAGE_DIR}" \
--train_annotations_file="${TRAIN_ANNOTATIONS_FILE}" \
--val_annotations_file="${VAL_ANNOTATIONS_FILE}" \
--output_dir="${OUTPUT_DIR}" \
--include_masks

PASCAL VOC XML files to TFRecord with mask images

https://github.com/ryouchinsa/Rectlabel-support/blob/master/rectlabel_create_pascal_tf_record.py

python object_detection/dataset_tools/rectlabel_create_pascal_tf_record.py \
--images_dir="${IMAGES_DIR}" \
--label_map_path="${LABEL_MAP_PATH}" \
--output_path="${OUTPUT_PATH}" \
--include_masks

We hope this would help.

来源：https://stackoverflow.com/questions/49684468/mask-r-cnn-for-object-detection-and-segmentation-train-for-a-custom-dataset

标签

python

machine-learning

computer-vision

object-detection

semantic-segmentation