I have a 4d tensor output for an object detector that outputs per-pixel, per class box predictions, i.e. Shape H x W x C x 6, where the innermost 6 wide dimension is box paramet