I'm using both ARKit & Vision, following along Apple's sample project, "Using Vision in Real Time with ARKit". So I am not setting up my camera as ARKit handles that for me.
Using Vision's VNDetectFaceRectanglesRequest, I'm able to get back a collection of VNFaceObservation objects.
Following various guides online, I'm able to transform the VNFaceObservation's boundingBox to one that I can use on my ViewController's UIView.
The Y-axis is correct when placed on my UIView in ARKit, but the X-axis is completely off & inaccurate.
// face is an instance of VNFaceObservation
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -view.frame.height)
let translate = CGAffineTransform.identity.scaledBy(x: view.frame.width, y: view.frame.height)
let rect = face.boundingBox.applying(translate).applying(transform)
What is the correct way to display the boundingBox on the screen (in ARKit/UIKit) so that the X & Y axis match up correctly to the detected face rectangle? I can't use self.cameraLayer.layerRectConverted(fromMetadataOutputRect: transformedRect)
since I'm not using AVCaptureSession.
Update: Digging into this further, the camera's image is 1920 x 1440. Most of it is not displayed on ARKit's screen space. The iPhone XS screen is 375 x 812 points.
After I get Vision's observation boundingBox, I've transformed it to fit the current view (375 x 812). This isn't working since the actual width seems to be 500 (the left & right sides are out of the screen view). How do I CGAffineTransform
the CGRect bounding box (seems like 500x812, a total guess) from 375x812?
The key piece missing here is ARFrame's displayTransform(for:viewportSize:)
. You can read the documentation for it here.
This function will generate the appropriate transform for a given frame and viewport size (the CGRect of the view you're displaying the image and bounding box in).
func visionTransform(frame: ARFrame, viewport: CGRect) -> CGAffineTransform {
let orientation = UIApplication.shared.statusBarOrientation
let transform = frame.displayTransform(for: orientation,
viewportSize: viewport.size)
let scale = CGAffineTransform(scaleX: viewport.width,
y: viewport.height)
var t = CGAffineTransform()
if orientation.isPortrait {
t = CGAffineTransform(scaleX: -1, y: 1)
t = t.translatedBy(x: -viewport.width, y: 0)
} else if orientation.isLandscape {
t = CGAffineTransform(scaleX: 1, y: -1)
t = t.translatedBy(x: 0, y: -viewport.height)
return transform.concatenating(scale).concatenating(t)
You can then use this like so:
let transform = visionTransform(frame: yourARFrame, viewport: yourViewport)
let rect = face.boundingBox.applying(transform)