I am trying to replicate SCRC baseline model mentioned in the paper
The paper says “use the camera pose recorded with each annotation to extract nearest 2D frames for