Continuous speech recogn. with SFSpeechRecognizer (ios10-beta)

后端 未结 5 1021
情歌与酒
情歌与酒 2020-12-08 01:28

I am trying to perform cont. speech recognition using AVCapture on iOS 10 beta. I have setup captureOutput(...) to continuously get CMSampleB

相关标签:
5条回答
  • 2020-12-08 01:53

    It turns out, that Apple's new native Speech Recognition does not detect end of Speech silences automatically(a bug?), which for your case is useful, because Speech Recognition is active for nearly one minute (the maximum period, permitted by Apple's service). So basically if you need to have continuous ASR you must re-launch speech recognition when your delegate triggers:

    func speechRecognitionTask(task: SFSpeechRecognitionTask, didFinishSuccessfully successfully: Bool) //wether succesfully= true or not
    

    Here is the recording/Speech recognition SWIFT code I use, it works perfectly. Ignore the part of where I calculate the mean power of the microphone volume, if you don't need it. I use it to animate a waveform. Don't forget to set the SFSpeechRecognitionTaskDelegate, and is delegate methods, if you need extra code, let me know.

    func startNativeRecording() throws {
            LEVEL_LOWPASS_TRIG=0.01
            //Setup Audio Session
            node = audioEngine.inputNode!
            let recordingFormat = node!.outputFormatForBus(0)
            node!.installTapOnBus(0, bufferSize: 1024, format: recordingFormat){(buffer, _) in
                self.nativeASRRequest.appendAudioPCMBuffer(buffer)
    
     //Code to animate a waveform with the microphone volume, ignore if you don't need it:
                var inNumberFrames:UInt32 = buffer.frameLength;
                var samples:Float32 = buffer.floatChannelData[0][0]; //https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
                var avgValue:Float32 = 0;
                vDSP_maxmgv(buffer.floatChannelData[0], 1, &avgValue, vDSP_Length(inNumberFrames)); //Accelerate Framework
                //vDSP_maxmgv returns peak values
                //vDSP_meamgv returns mean magnitude of a vector
    
                let avg3:Float32=((avgValue == 0) ? (0-100) : 20.0)
                var averagePower=(self.LEVEL_LOWPASS_TRIG*avg3*log10f(avgValue)) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel0) ;
                print("AVG. POWER: "+averagePower.description)
                dispatch_async(dispatch_get_main_queue(), { () -> Void in
                    //print("VU: "+vu.description)
                    var fAvgPwr=CGFloat(averagePower)
                    print("AvgPwr: "+fAvgPwr.description)
    
                    var waveformFriendlyValue=0.5+fAvgPwr //-0.5 is AvgPwrValue when user is silent
                    if(waveformFriendlyValue<0){waveformFriendlyValue=0} //round values <0 to 0
                    self.waveview.hidden=false
                    self.waveview.updateWithLevel(waveformFriendlyValue)
                })
            }
            audioEngine.prepare()
            try audioEngine.start()
            isNativeASRBusy=true
            nativeASRTask = nativeSpeechRecognizer?.recognitionTaskWithRequest(nativeASRRequest, delegate: self)
            nativeSpeechRecognizer?.delegate=self
      //I use this timer to track no speech timeouts, ignore if not neeeded:
            self.endOfSpeechTimeoutTimer = NSTimer.scheduledTimerWithTimeInterval(utteranceTimeoutSeconds, target: self, selector:  #selector(ViewController.stopNativeRecording), userInfo: nil, repeats: false)
        }
    
    0 讨论(0)
  • 2020-12-08 01:54

    This works perfectly in my app. You can ask for queries at saifurrahman3126@gmail.com Apple does not allow users to continuously translate for more than one minute. https://developer.apple.com/documentation/speech/sfspeechrecognizer check here

    "Plan for a one-minute limit on audio duration. Speech recognition places a relatively high burden on battery life and network usage. To minimize this burden, the framework stops speech recognition tasks that last longer than one minute. This limit is similar to the one for keyboard-related dictation." This is what Apple says in its documentation.

    For now, I have made requests for 40 seconds and then I reconnect it again if you speak before 40 secs and then pause , the recording will start again.

    @objc  func startRecording() {
        
        self.fullsTring = ""
        audioEngine.reset()
        
        if recognitionTask != nil {
            recognitionTask?.cancel()
            recognitionTask = nil
        }
        
        let audioSession = AVAudioSession.sharedInstance()
        do {
            try audioSession.setCategory(.record)
            try audioSession.setMode(.measurement)
            try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
            try audioSession.setPreferredSampleRate(44100.0)
            
            if audioSession.isInputGainSettable {
                let error : NSErrorPointer = nil
                
                let success = try? audioSession.setInputGain(1.0)
                
                guard success != nil else {
                    print ("audio error")
                    return
                }
                if (success != nil) {
                    print("\(String(describing: error))")
                }
            }
            else {
                print("Cannot set input gain")
            }
        } catch {
            print("audioSession properties weren't set because of an error.")
        }
        recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
        
        let inputNode = audioEngine.inputNode
        guard let recognitionRequest = recognitionRequest else {
            fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
        }
        
        recognitionRequest.shouldReportPartialResults = true
        self.timer4 = Timer.scheduledTimer(timeInterval: TimeInterval(40), target: self, selector: #selector(againStartRec), userInfo: nil, repeats: false)
        
        recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error ) in
            
            var isFinal = false  //8
            
            if result != nil {
                self.timer.invalidate()
                self.timer = Timer.scheduledTimer(timeInterval: TimeInterval(2.0), target: self, selector: #selector(self.didFinishTalk), userInfo: nil, repeats: false)
                
                let bestString = result?.bestTranscription.formattedString
                self.fullsTring = bestString!
                
                self.inputContainerView.inputTextField.text = result?.bestTranscription.formattedString
                
                isFinal = result!.isFinal
                
            }
            if error == nil{
                
            }
            if  isFinal {
                
                self.audioEngine.stop()
                inputNode.removeTap(onBus: 0)
                
                self.recognitionRequest = nil
                self.recognitionTask = nil
                isFinal = false
                
            }
            if error != nil{
                URLCache.shared.removeAllCachedResponses()
                
                self.audioEngine.stop()
                inputNode.removeTap(onBus: 0)
                
                guard let task = self.recognitionTask else {
                    return
                }
                task.cancel()
                task.finish()
            }
        })
        audioEngine.reset()
        inputNode.removeTap(onBus: 0)
        
        let recordingFormat = AVAudioFormat(standardFormatWithSampleRate: 44100, channels: 1)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
            self.recognitionRequest?.append(buffer)
        }
        
        audioEngine.prepare()
        
        do {
            try audioEngine.start()
        } catch {
            print("audioEngine couldn't start because of an error.")
        }
        
        self.hasrecorded = true
    }
    
    @objc func againStartRec(){
        
        self.inputContainerView.uploadImageView.setBackgroundImage( #imageLiteral(resourceName: "microphone") , for: .normal)
        self.inputContainerView.uploadImageView.alpha = 1.0
        self.timer4.invalidate()
        timer.invalidate()
        self.timer.invalidate()
        
        if ((self.audioEngine.isRunning)){
            
            self.audioEngine.stop()
            self.recognitionRequest?.endAudio()
            self.recognitionTask?.finish()
        }
        self.timer2 = Timer.scheduledTimer(timeInterval: 2, target: self, selector: #selector(startRecording), userInfo: nil, repeats: false)
    }
    
    @objc func didFinishTalk(){
        
        if self.fullsTring != ""{
            
            self.timer4.invalidate()
            self.timer.invalidate()
            self.timer2.invalidate()
            
            if ((self.audioEngine.isRunning)){
                self.audioEngine.stop()
                guard let task = self.recognitionTask else {
                    return
                }
                task.cancel()
                task.finish()
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-08 01:57

    I have success to use the SFSpeechRecognizer in continuous. The main point is to use AVCaptureSession to capture audio and transfer to SpeechRecognizer. Sorry I am poor in Swift,so just the ObjC version.

    Here is my sample code (leave out some UI code,some important has marked):

    @interface ViewController ()<AVCaptureAudioDataOutputSampleBufferDelegate,SFSpeechRecognitionTaskDelegate>
    @property (nonatomic, strong) AVCaptureSession *capture;
    @property (nonatomic, strong) SFSpeechAudioBufferRecognitionRequest *speechRequest;
    @end
    
    @implementation ViewController
    - (void)startRecognizer
    {
        [SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {
            if (status == SFSpeechRecognizerAuthorizationStatusAuthorized){
                NSLocale *local =[[NSLocale alloc] initWithLocaleIdentifier:@"fr_FR"];
                SFSpeechRecognizer *sf =[[SFSpeechRecognizer alloc] initWithLocale:local];
                self.speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
                [sf recognitionTaskWithRequest:self.speechRequest delegate:self];
                // should call startCapture method in main queue or it may crash
                dispatch_async(dispatch_get_main_queue(), ^{
                    [self startCapture];
                });
            }
        }];
    }
    
    - (void)endRecognizer
    {
        // END capture and END voice Reco
        // or Apple will terminate this task after 30000ms.
        [self endCapture];
        [self.speechRequest endAudio];
    }
    
    - (void)startCapture
    {
        NSError *error;
        self.capture = [[AVCaptureSession alloc] init];
        AVCaptureDevice *audioDev = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
        if (audioDev == nil){
            NSLog(@"Couldn't create audio capture device");
            return ;
        }
    
        // create mic device
        AVCaptureDeviceInput *audioIn = [AVCaptureDeviceInput deviceInputWithDevice:audioDev error:&error];
        if (error != nil){
            NSLog(@"Couldn't create audio input");
            return ;
        }
    
        // add mic device in capture object
        if ([self.capture canAddInput:audioIn] == NO){
            NSLog(@"Couldn't add audio input");
            return ;
        }
        [self.capture addInput:audioIn];
        // export audio data
        AVCaptureAudioDataOutput *audioOutput = [[AVCaptureAudioDataOutput alloc] init];
        [audioOutput setSampleBufferDelegate:self queue:dispatch_get_main_queue()];
        if ([self.capture canAddOutput:audioOutput] == NO){
            NSLog(@"Couldn't add audio output");
            return ;
        }
        [self.capture addOutput:audioOutput];
        [audioOutput connectionWithMediaType:AVMediaTypeAudio];
        [self.capture startRunning];
    }
    
    -(void)endCapture
    {
        if (self.capture != nil && [self.capture isRunning]){
            [self.capture stopRunning];
        }
    }
    
    - (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
    {
        [self.speechRequest appendAudioSampleBuffer:sampleBuffer];
    }
    // some Recognition Delegate
    @end
    
    0 讨论(0)
  • 2020-12-08 01:59

    I converted the SpeakToMe sample Swift code from the Speech Recognition WWDC developer talk to Objective-C, and it worked for me. For Swift, see https://developer.apple.com/videos/play/wwdc2016/509/, or for Objective-C see below.

    - (void) viewDidAppear:(BOOL)animated {
    
    _recognizer = [[SFSpeechRecognizer alloc] initWithLocale:[NSLocale localeWithLocaleIdentifier:@"en-US"]];
    [_recognizer setDelegate:self];
    [SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus authStatus) {
        switch (authStatus) {
            case SFSpeechRecognizerAuthorizationStatusAuthorized:
                //User gave access to speech recognition
                NSLog(@"Authorized");
                break;
    
            case SFSpeechRecognizerAuthorizationStatusDenied:
                //User denied access to speech recognition
                NSLog(@"SFSpeechRecognizerAuthorizationStatusDenied");
                break;
    
            case SFSpeechRecognizerAuthorizationStatusRestricted:
                //Speech recognition restricted on this device
                NSLog(@"SFSpeechRecognizerAuthorizationStatusRestricted");
                break;
    
            case SFSpeechRecognizerAuthorizationStatusNotDetermined:
                //Speech recognition not yet authorized
    
                break;
    
            default:
                NSLog(@"Default");
                break;
        }
    }];
    
    audioEngine = [[AVAudioEngine alloc] init];
    _speechSynthesizer  = [[AVSpeechSynthesizer alloc] init];         
    [_speechSynthesizer setDelegate:self];
    }
    
    
    -(void)startRecording
    {
    [self clearLogs:nil];
    
    NSError * outError;
    
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession setCategory:AVAudioSessionCategoryRecord error:&outError];
    [audioSession setMode:AVAudioSessionModeMeasurement error:&outError];
    [audioSession setActive:true withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation  error:&outError];
    
    request2 = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
    
    inputNode = [audioEngine inputNode];
    
    if (request2 == nil) {
        NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
    }
    
    if (inputNode == nil) {
    
        NSLog(@"Unable to created a inputNode object");
    }
    
    request2.shouldReportPartialResults = true;
    
    _currentTask = [_recognizer recognitionTaskWithRequest:request2
                    delegate:self];
    
    [inputNode installTapOnBus:0 bufferSize:4096 format:[inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer *buffer, AVAudioTime *when){
        NSLog(@"Block tap!");
    
        [request2 appendAudioPCMBuffer:buffer];
    
    }];
    
        [audioEngine prepare];
        [audioEngine startAndReturnError:&outError];
        NSLog(@"Error %@", outError);
    }
    
    - (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)result {
    
    NSLog(@"speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition");
    NSString * translatedString = [[[result bestTranscription] formattedString] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
    
    [self log:translatedString];
    
    if ([result isFinal]) {
        [audioEngine stop];
        [inputNode removeTapOnBus:0];
        _currentTask = nil;
        request2 = nil;
    }
    }
    
    0 讨论(0)
  • 2020-12-08 01:59

    Here is the Swift (3.0) implementation of @cube's answer:

    import UIKit
    import Speech
    import AVFoundation
    
    
    class ViewController: UIViewController  {
      @IBOutlet weak var console: UITextView!
    
      var capture: AVCaptureSession?
      var speechRequest: SFSpeechAudioBufferRecognitionRequest?
      override func viewDidLoad() {
        super.viewDidLoad()
      }
      override func viewDidAppear(_ animated: Bool) {
        super.viewDidAppear(animated)
        startRecognizer()
      }
    
      func startRecognizer() {
        SFSpeechRecognizer.requestAuthorization { (status) in
          switch status {
          case .authorized:
            let locale = NSLocale(localeIdentifier: "fr_FR")
            let sf = SFSpeechRecognizer(locale: locale as Locale)
            self.speechRequest = SFSpeechAudioBufferRecognitionRequest()
            sf?.recognitionTask(with: self.speechRequest!, delegate: self)
            DispatchQueue.main.async {
    
            }
          case .denied:
            fallthrough
          case .notDetermined:
            fallthrough
          case.restricted:
            print("User Autorization Issue.")
          }
        }
    
      }
    
      func endRecognizer() {
        endCapture()
        speechRequest?.endAudio()
      }
    
      func startCapture() {
    
        capture = AVCaptureSession()
    
        guard let audioDev = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeAudio) else {
          print("Could not get capture device.")
          return
        }
    
        guard let audioIn = try? AVCaptureDeviceInput(device: audioDev) else {
          print("Could not create input device.")
          return
        }
    
        guard true == capture?.canAddInput(audioIn) else {
          print("Couls not add input device")
          return
        }
    
        capture?.addInput(audioIn)
    
        let audioOut = AVCaptureAudioDataOutput()
        audioOut.setSampleBufferDelegate(self, queue: DispatchQueue.main)
    
        guard true == capture?.canAddOutput(audioOut) else {
          print("Could not add audio output")
          return
        }
    
        capture?.addOutput(audioOut)
        audioOut.connection(withMediaType: AVMediaTypeAudio)
        capture?.startRunning()
    
    
      }
    
      func endCapture() {
    
        if true == capture?.isRunning {
          capture?.stopRunning()
        }
      }
    }
    
    extension ViewController: AVCaptureAudioDataOutputSampleBufferDelegate {
      func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
        speechRequest?.appendAudioSampleBuffer(sampleBuffer)
      }
    
    }
    
    extension ViewController: SFSpeechRecognitionTaskDelegate {
    
      func speechRecognitionTask(_ task: SFSpeechRecognitionTask, didFinishRecognition recognitionResult: SFSpeechRecognitionResult) {
        console.text = console.text + "\n" + recognitionResult.bestTranscription.formattedString
      }
    }
    

    Don't forget to add a value for NSSpeechRecognitionUsageDescription in info.plist file or otherwise it'll crash.

    0 讨论(0)
提交回复
热议问题