As we know, conv input size is fixed. But when pre-training an object-detection backbone, one often uses a dataset that contains only target object
conv