How to track the barcode with highest confidence

问题

I am using vision framework to detect barcodes. I want to show a rect around the barcode with highest confidence on live video, meaning, I want to track that rect to the barcode seen on the live preview.

So I have this code to detect the barcodes within a roi.

lazy var barcodeRequest: VNDetectBarcodesRequest = {
    let barcodeRequest = VNDetectBarcodesRequest {[weak self] request, error in
      guard error == nil else {
        print ("ERRO: \(error?.localizedDescription ?? "error")")
        return
      }
      self?.resultClassification(request)
    }
    barcodeRequest.regionOfInterest = CGRect(x: 0,
                                             y: 0.3,
                                             width: 1,
                                             height: 0.4)
    return barcodeRequest
  }()

This method will fire when the barcodes are detected

func resultClassification(_ request: VNRequest) {
    guard let barcodes = request.results,
          let potentialCodes = barcodes as? [VNBarcodeObservation]
    else { return }
    
    // choose the bar code with highestConfidence
    let highestConfidenceBarcodeDetected = potentialCodes.max(by: {$0.confidence < $1.confidence})
    
    // do something with highestConfidenceBarcodeDetected

    // 1
  }

This is my problem.

Now that I have the highest confidence barcode, I want to track it around the screen. So, I think I will have to add code at // 1.

But before that I have to define this for the tracker:

var inputObservation:VNDetectedObjectObservation!


lazy var barcodeTrackingRequest: VNTrackObjectRequest = {
  let barcodeTrackingRequest = VNTrackObjectRequest(detectedObjectObservation: inputObservation) { [weak self] request, error in
    guard error == nil else {
      print("Detection error: \(String(describing: error)).")
      return
    }
    self?.resultClassificationTracker(request)
  }
  return barcodeTrackingRequest
}()

func resultClassificationTracker(_ request:VNRequest) {
  // all I want from this is to store the boundingbox on a var  
}

Now, how do I connect these two pieces of code, so resultClassificationTracker fires every time I get a bounding box value for the tracker?

回答1:

I did something similar a while ago and wrote an article on it. It's for VNRecognizeTextRequest not VNDetectBarcodesRequest, but it's similar. This is what I did:

Perform VNImageRequestHandler continuously (once it finishes, start another again)
Store the detection indicator view in a property var previousTrackingView: UIView?
Animate the detection indicator to the new rectangle whenever the request handler finishes
Use Core Motion to detect device movement, and adjust the frame of the detection indicator

Here is the result:

As you can see the height/y coordinate is not very accurate. My guess is that Vision only needs a horizontal line to scan barcodes - like those laser scanners in grocery stores - so it doesn't return the full height. But that is a different problem.

Perform `VNImageRequestHandler` continuously (once it finishes, start another again)

For this, I'm making a property busyPerformingVisionRequest, and whenever this is false, I call the Vision request. This is inside the didOutput function which gets called whenever the camera frame changes.


class ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {

    var busyPerformingVisionRequest = false

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

        if busyPerformingVisionRequest == false {
            lookForBarcodes(in: pixelBuffer) /// start the vision as many times as possible
        }
    }
}

Store the detection indicator view in a property `var previousTrackingView: UIView?`

Below is my Vision handler that gets called when the Vision request completes. I first set busyPerformingVisionRequest to false, so another Vision request can be made. Then I convert the bounding box to screen coordinates and call self.drawTrackingView(at: convertedRect).

func resultClassificationTracker(request: VNRequest?, error: Error?) {
    busyPerformingVisionRequest = false
    
    if let results = request?.results {
        if let observation = results.first as? VNBarcodeObservation {
            
            var x = observation.boundingBox.origin.x
            var y = 1 - observation.boundingBox.origin.y
            var height = CGFloat(0) /// ignore the bounding height
            var width = observation.boundingBox.width
            
            /// we're going to do some converting
            let convertedOriginalWidthOfBigImage = aspectRatioWidthOverHeight * deviceSize.height
            let offsetWidth = convertedOriginalWidthOfBigImage - deviceSize.width
            
            /// The pixelbuffer that we got Vision to process is bigger then the device's screen, so we need to adjust it
            let offHalf = offsetWidth / 2
            
            width *= convertedOriginalWidthOfBigImage
            height = width * (CGFloat(9) / CGFloat(16))
            x *= convertedOriginalWidthOfBigImage
            x -= offHalf
            y *= deviceSize.height
            y -= height
            
            let convertedRect = CGRect(x: x, y: y, width: width, height: height)
            
            DispatchQueue.main.async {
                self.drawTrackingView(at: convertedRect)
            }
            
        }
    }
}

Animate the detection indicator to the new rectangle whenever the request handler finishes

This is my function drawTrackingView. If there is a tracking rectangle view drawn already, it animates it to the new frame. If not, it just adds it as a subview.

func drawTrackingView(at rect: CGRect) {
    if let previousTrackingView = previousTrackingView { /// already drawn one previously, just change the frame now
        UIView.animate(withDuration: 0.8) {
            previousTrackingView.frame = rect
        }
        
    } else { /// add it as a subview
        let trackingView = UIView(frame: rect)
        drawingView.addSubview(trackingView)
        trackingView.backgroundColor = UIColor.blue.withAlphaComponent(0.2)
        trackingView.layer.borderWidth = 3
        trackingView.layer.borderColor = UIColor.blue.cgColor
        
        
        previousTrackingView = trackingView
    }
}

Use Core Motion to detect device movement, and adjust the frame of the detection indicator

I first store a couple motion-related properties. Then, in viewDidLoad, I start the motion updates.

-----ViewController.swift-----

/// motionManager will be what we'll use to get device motion
var motionManager = CMMotionManager()
    
/// this will be the "device’s true orientation in space" (Source: https://nshipster.com/cmdevicemotion/)
var initialAttitude: CMAttitude?
     
/// we'll later read these values to update the highlight's position
var motionX = Double(0) /// aka Roll
var motionY = Double(0) /// aka Pitch

override func viewDidLayoutSubviews() {
    super.viewDidLayoutSubviews()
    
    /// viewDidLoad() is often too early to get the first initial attitude, so we use viewDidLayoutSubviews() instead
    if let currentAttitude = motionManager.deviceMotion?.attitude {
        /// we populate initialAttitude with the current attitude
        initialAttitude = currentAttitude
    }
    
}
override func viewDidLoad() {
    super.viewDidLoad()
    
    /// This is how often we will get device motion updates
    /// 0.03 is more than often enough and is about the rate that the video frame changes
    motionManager.deviceMotionUpdateInterval = 0.03
    
    motionManager.startDeviceMotionUpdates(to: .main) {
        [weak self] (data, error) in
        guard let data = data, error == nil else {
            return
        }
        
        /// This function will be called every 0.03 seconds
        self?.updateTrackingFrames(attitude: data.attitude)
    }

    ...
}

Every 0.03 seconds I will call updateTrackingFrames, which will read the new physical movement of the device. This is meant to be reduce jitter, like when your user's hands are shaking.

func updateTrackingFrames(attitude: CMAttitude) {
    /// initialAttitude is an optional that points to the reference frame that the device started at
    /// we set this when the device lays out it's subviews on the first launch
    if let initAttitude = initialAttitude {
        
        /// We can now translate the current attitude to the reference frame
        attitude.multiply(byInverseOf: initAttitude)
        
        /// Roll is the movement of the phone left and right, Pitch is forwards and backwards
        let rollValue = attitude.roll.radiansToDegrees
        let pitchValue = attitude.pitch.radiansToDegrees
        
        /// This is a magic number, but for simplicity, we won't do any advanced trigonometry -- also, 3 works pretty well
        let conversion = Double(3)
        
        /// Here, we figure out how much the values changed by comparing against the previous values (motionX and motionY)
        let differenceInX = (rollValue - motionX) * conversion
        let differenceInY = (pitchValue - motionY) * conversion
        
        /// Now we adjust the tracking view's position
        if let previousTrackingView = previousTrackingView {
            previousTrackingView.frame.origin.x += CGFloat(differenceInX)
            previousTrackingView.frame.origin.y += CGFloat(differenceInY)
        }
        
        /// finally, we put the new attitude values into motionX and motionY so we can compare against them in 0.03 seconds (the next time this function is called)
        motionX = rollValue
        motionY = pitchValue
    }
}

This Core Motion implementation isn't very accurate - I hardcode the multiplier constant (Double(3)) that adjusts the frame of the tracking indicator. But it's enough to cancel out small jitter.

Here is the final repo: https://github.com/aheze/BarcodeScanner

来源：https://stackoverflow.com/questions/66030924/how-to-track-the-barcode-with-highest-confidence

标签

swift

vision