问题
I am using vision framework to detect barcodes. I want to show a rect around the barcode with highest confidence on live video, meaning, I want to track that rect to the barcode seen on the live preview.
So I have this code to detect the barcodes within a roi.
lazy var barcodeRequest: VNDetectBarcodesRequest = {
let barcodeRequest = VNDetectBarcodesRequest {[weak self] request, error in
guard error == nil else {
print ("ERRO: \(error?.localizedDescription ?? "error")")
return
}
self?.resultClassification(request)
}
barcodeRequest.regionOfInterest = CGRect(x: 0,
y: 0.3,
width: 1,
height: 0.4)
return barcodeRequest
}()
This method will fire when the barcodes are detected
func resultClassification(_ request: VNRequest) {
guard let barcodes = request.results,
let potentialCodes = barcodes as? [VNBarcodeObservation]
else { return }
// choose the bar code with highestConfidence
let highestConfidenceBarcodeDetected = potentialCodes.max(by: {$0.confidence < $1.confidence})
// do something with highestConfidenceBarcodeDetected
// 1
}
This is my problem.
Now that I have the highest confidence barcode, I want to track it around the screen. So, I think I will have to add code at // 1
.
But before that I have to define this for the tracker:
var inputObservation:VNDetectedObjectObservation!
lazy var barcodeTrackingRequest: VNTrackObjectRequest = {
let barcodeTrackingRequest = VNTrackObjectRequest(detectedObjectObservation: inputObservation) { [weak self] request, error in
guard error == nil else {
print("Detection error: \(String(describing: error)).")
return
}
self?.resultClassificationTracker(request)
}
return barcodeTrackingRequest
}()
func resultClassificationTracker(_ request:VNRequest) {
// all I want from this is to store the boundingbox on a var
}
Now, how do I connect these two pieces of code, so resultClassificationTracker
fires every time I get a bounding box value for the tracker?
回答1:
I did something similar a while ago and wrote an article on it. It's for VNRecognizeTextRequest
not VNDetectBarcodesRequest
, but it's similar. This is what I did:
- Perform
VNImageRequestHandler
continuously (once it finishes, start another again) - Store the detection indicator view in a property
var previousTrackingView: UIView?
- Animate the detection indicator to the new rectangle whenever the request handler finishes
- Use Core Motion to detect device movement, and adjust the frame of the detection indicator
Here is the result:
As you can see the height/y coordinate is not very accurate. My guess is that Vision only needs a horizontal line to scan barcodes - like those laser scanners in grocery stores - so it doesn't return the full height. But that is a different problem.
Perform VNImageRequestHandler
continuously (once it finishes, start another again)
For this, I'm making a property busyPerformingVisionRequest
, and whenever this is false, I call the Vision request. This is inside the didOutput
function which gets called whenever the camera frame changes.
class ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
var busyPerformingVisionRequest = false
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
if busyPerformingVisionRequest == false {
lookForBarcodes(in: pixelBuffer) /// start the vision as many times as possible
}
}
}
Store the detection indicator view in a property var previousTrackingView: UIView?
Below is my Vision handler that gets called when the Vision request completes. I first set busyPerformingVisionRequest
to false, so another Vision request can be made. Then I convert the bounding box to screen coordinates and call self.drawTrackingView(at: convertedRect)
.
func resultClassificationTracker(request: VNRequest?, error: Error?) {
busyPerformingVisionRequest = false
if let results = request?.results {
if let observation = results.first as? VNBarcodeObservation {
var x = observation.boundingBox.origin.x
var y = 1 - observation.boundingBox.origin.y
var height = CGFloat(0) /// ignore the bounding height
var width = observation.boundingBox.width
/// we're going to do some converting
let convertedOriginalWidthOfBigImage = aspectRatioWidthOverHeight * deviceSize.height
let offsetWidth = convertedOriginalWidthOfBigImage - deviceSize.width
/// The pixelbuffer that we got Vision to process is bigger then the device's screen, so we need to adjust it
let offHalf = offsetWidth / 2
width *= convertedOriginalWidthOfBigImage
height = width * (CGFloat(9) / CGFloat(16))
x *= convertedOriginalWidthOfBigImage
x -= offHalf
y *= deviceSize.height
y -= height
let convertedRect = CGRect(x: x, y: y, width: width, height: height)
DispatchQueue.main.async {
self.drawTrackingView(at: convertedRect)
}
}
}
}
Animate the detection indicator to the new rectangle whenever the request handler finishes
This is my function drawTrackingView
. If there is a tracking rectangle view drawn already, it animates it to the new frame. If not, it just adds it as a subview.
func drawTrackingView(at rect: CGRect) {
if let previousTrackingView = previousTrackingView { /// already drawn one previously, just change the frame now
UIView.animate(withDuration: 0.8) {
previousTrackingView.frame = rect
}
} else { /// add it as a subview
let trackingView = UIView(frame: rect)
drawingView.addSubview(trackingView)
trackingView.backgroundColor = UIColor.blue.withAlphaComponent(0.2)
trackingView.layer.borderWidth = 3
trackingView.layer.borderColor = UIColor.blue.cgColor
previousTrackingView = trackingView
}
}
Use Core Motion to detect device movement, and adjust the frame of the detection indicator
I first store a couple motion-related properties. Then, in viewDidLoad
, I start the motion updates.
-----ViewController.swift-----
/// motionManager will be what we'll use to get device motion
var motionManager = CMMotionManager()
/// this will be the "device’s true orientation in space" (Source: https://nshipster.com/cmdevicemotion/)
var initialAttitude: CMAttitude?
/// we'll later read these values to update the highlight's position
var motionX = Double(0) /// aka Roll
var motionY = Double(0) /// aka Pitch
override func viewDidLayoutSubviews() {
super.viewDidLayoutSubviews()
/// viewDidLoad() is often too early to get the first initial attitude, so we use viewDidLayoutSubviews() instead
if let currentAttitude = motionManager.deviceMotion?.attitude {
/// we populate initialAttitude with the current attitude
initialAttitude = currentAttitude
}
}
override func viewDidLoad() {
super.viewDidLoad()
/// This is how often we will get device motion updates
/// 0.03 is more than often enough and is about the rate that the video frame changes
motionManager.deviceMotionUpdateInterval = 0.03
motionManager.startDeviceMotionUpdates(to: .main) {
[weak self] (data, error) in
guard let data = data, error == nil else {
return
}
/// This function will be called every 0.03 seconds
self?.updateTrackingFrames(attitude: data.attitude)
}
...
}
Every 0.03 seconds I will call updateTrackingFrames
, which will read the new physical movement of the device. This is meant to be reduce jitter, like when your user's hands are shaking.
func updateTrackingFrames(attitude: CMAttitude) {
/// initialAttitude is an optional that points to the reference frame that the device started at
/// we set this when the device lays out it's subviews on the first launch
if let initAttitude = initialAttitude {
/// We can now translate the current attitude to the reference frame
attitude.multiply(byInverseOf: initAttitude)
/// Roll is the movement of the phone left and right, Pitch is forwards and backwards
let rollValue = attitude.roll.radiansToDegrees
let pitchValue = attitude.pitch.radiansToDegrees
/// This is a magic number, but for simplicity, we won't do any advanced trigonometry -- also, 3 works pretty well
let conversion = Double(3)
/// Here, we figure out how much the values changed by comparing against the previous values (motionX and motionY)
let differenceInX = (rollValue - motionX) * conversion
let differenceInY = (pitchValue - motionY) * conversion
/// Now we adjust the tracking view's position
if let previousTrackingView = previousTrackingView {
previousTrackingView.frame.origin.x += CGFloat(differenceInX)
previousTrackingView.frame.origin.y += CGFloat(differenceInY)
}
/// finally, we put the new attitude values into motionX and motionY so we can compare against them in 0.03 seconds (the next time this function is called)
motionX = rollValue
motionY = pitchValue
}
}
This Core Motion implementation isn't very accurate - I hardcode the multiplier constant (Double(3)
) that adjusts the frame of the tracking indicator. But it's enough to cancel out small jitter.
Here is the final repo: https://github.com/aheze/BarcodeScanner
来源:https://stackoverflow.com/questions/66030924/how-to-track-the-barcode-with-highest-confidence