How to convert wav file to spectrogram for tensorflowjs with columnTruncateLength: 232 and numFramesPerSpectrogram: 43?

后端 未结 1 500
时光说笑
时光说笑 2021-01-21 08:17

i\'m trying to use tensorflowjs speech recognition in offline mode. online mode using microphone is working fine. but for offline mode i\'m not able to find any reliable library

1条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-21 08:39

    The only requirement when working with offline recognition is to have an input tensor of shape [null, 43, 232, 1].

    1 - Read the wav file and get the array of data

    var spectrogram = require('spectrogram');
    
    var spectro = Spectrogram(document.getElementById('canvas'), {
      audio: {
        enable: false
      }
    });
    
    var audioContext = new AudioContext();
    
    readWavFile() {
    return new Promise(resove => {
    var request = new XMLHttpRequest();
    request.open('GET', 'audio.mp3', true);
    request.responseType = 'arraybuffer';
    
    request.onload = function() {
      audioContext.decodeAudioData(request.response, function(buffer) {
        resolve(buffer)
      });
    };
    request.send()
    })
    
    }
    
    const buffer = await readWavFile()
    

    The same thing can be done without using the third party library. 2 options are possible.

    • Read the file using . In that case, this answer shows how to get the typedarray.

    • Serve and read the wav file using a http request

    var req = new XMLHttpRequest();
    req.open("GET", "file.wav", true);
    req.responseType = "arraybuffer";
    
    req.onload = function () {
      var arrayBuffer = req.response;
      if (arrayBuffer) {
        var byteArray = new Float32Array(arrayBuffer);
      }
    };
    
    req.send(null);
    

    2- convert the buffer to typedarray

    const data = Float32Array(buffer)
    

    3- convert the array to a tensor using the shape of the speech recognition model

    const x = tf.tensor(
       data).reshape([-1, ...recognizer.modelInputShape().slice(1));
    

    If the latter commands fails, it means that the data does not have the shape needed for the model. The tensor needs to be sliced to have the appropriate shape or the recording made should respect the fft and other parameters.

    0 讨论(0)
提交回复
热议问题