Im trying to train a model to check images, identify specified objects and tell me its coodinates (i dont even need to see an square around the object).
For this im using Tensorflow's object detection and most of what I did was looking this tutorial:
But some things changed, probably because of updates, and then I had to do somethings on my own. I can actually train the model (I guess) but I don't understand the evaluation results. Im used to see loss and current step but this output is unusual for me. Also I don't think the training is being saved.
Training command line:
model_main.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_coco.config
Config file:
model {
faster_rcnn {
num_classes: 9
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_inception_v2'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 5
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 900000
learning_rate: .00002
}
schedule {
step: 1200000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
num_steps: 50000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "C:/tensorflow1/models/research/object_detection/images/train.record"
}
label_map_path: "C:/tensorflow1/models/research/object_detection/training/object-detection.pbtxt"
}
eval_config: {
num_examples: 67
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "C:/tensorflow1/models/research/object_detection/images/test.record"
}
label_map_path: "C:/tensorflow1/models/research/object_detection/training/object-detection.pbtxt"
shuffle: false
num_readers: 1
}
Output:
2019-03-16 01:05:23.842424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-16 01:05:23.842528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-16 01:05:23.845561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-03-16 01:05:23.845777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-03-16 01:05:23.847854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6390 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.05s).
Accumulating evaluation results...
DONE (t=0.04s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.681
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.670
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.542
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.825
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.682
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.689
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.689
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.556
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.825
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Also the models inside faster_rcnn_inception_v2_coco_2018_01_28
have not been changed since Jan 2018, which probably means that even if it's training, it's not saving the progress.
My questions are:
- Am I doing something wrong with the config or something else?
- Is the training progress being saved ?
- How can I understand this output? (IoU? maxDets? area? negative precision? is it for a single batch or what?)
- Should I wait for this stops by itself eventually? I cant see at which step I am at and just this piece of output that I used as example here took almost 15 minutes to appear.
Wow, a lot of questions to answer here.
1 .I think your config file is correct, usually the fields that need to be carefully configured are:
num_classes:
the number of classes of your datasetfine_tune_checkpoint
: the checkpoint to start the training with if you adopt tansfer learning, this should be provided iffrom_detection_checkpoint
is set true.label_map_path
: path to your label file, the number of classes should be equal tonum_classes
input_path
in bothtrain_input_reader
andeval_input_reader
num_examples
ineval_config
, this is your validation dataset size, e.g. the number of examples in your validation dataset.num_steps
: this is the total number of training steps to reach before the model stops training.
2 Yes, your training process is being saved, it is saved at train_dir
(if you are using the older version api, but model_dir
if you are using the latest version), the official description is here. You can use tensorbard
to visualize your training process.
3 The output if of COCO evaluation format as this is the default evalution metric option. But you can try other evalution metrics by setting metrics_set :
in eval_config
in the config file, other options are available here. For coco metrics, specifically:
IoU
is Intersection over Union, this defines how much your detection bounding box overlaps with your groundtruth box. This answer provides more details for you to understand how the precision is calculated on different IoUs.maxDets
is thresholds on max detections per image (see here for better discussion)area
, there are three categories of area, it depends the number of pixels the area takes, small, medium and large are all defined here.- As for negative precision for category 'large', I think this is because this is the default value if no detections are categorized as 'large' (But I cannot confirm this, you may refer to the official coco website http://cocodataset.org/#home)
- The evaluation is performed on the whole validation dataset, so all images in your validation set.
- This file provides more details on coco metrics
4 The training will stop once the total number of training step is reached to num_steps
as set in your cofig file. In your case, every 15 minutes an evaluation session is performed. Also how often each evaluation is performed can also be configured in the config file.
5 Although you followed the tutorial above, but I suggest follow the official API documentation https://github.com/tensorflow/models/tree/master/research/object_detection.
PS: Indeed I can confirm the negative precision score is because of the absence of corresponding category. See reference in the cocoapi.
来源:https://stackoverflow.com/questions/55193486/tensorflow-object-detection-next-steps