I want to train a Faster R-CNN with 224x224x3 images that are log-scaled mel spectrograms from the UrbanSound8k dataset The data is already labeled with rectangles. here\'s