understanding getByteTimeDomainData and getByteFrequencyData in web audio

前端 未结 3 1938
旧时难觅i
旧时难觅i 2020-12-29 07:29

The documentation for both of these methods are both very generic wherever I look. I would like to know what exactly I\'m looking at with the returned arrays I\'m getting fr

相关标签:
3条回答
  • 2020-12-29 07:43

    Mozilla 's documentation describes the difference between getFloatTimeDomainData and getFloatFrequencyData, which I summarize below. Mozilla docs reference the Web Audio experiment ; the voice-change-o-matic. The voice-change-o-matic illustrates the conceptual difference to me (it only works in my Firefox; it does not work in my Chrome).

    TimeDomain

    • TimeDomain functions are over some span of time.
    • We often visualize TimeDomain data using oscilloscopes.
    • In other words, we visualize TimeDomain data with a line chart, where the x-axis is time, and the y axis is a measure of a signal.
    • Change the voice-change-o-matic "visualizer setting" to sinewave to see getFloatTimeDomainData(...)

    Frequency

    • Frequency functions (GetByteFrequencyData) are at a point in time.
    • We sometimes see these in mp3 players/ winamp-style music players (aka "equalizer' visualizations).
    • In other words, we visualize Frequency data with a bar graph, where the x-axis are frequency bands, and the y-axis is the strength of each frequency band
    • Change the voice-change-o-matic "visualizer setting" to frequency bars to see getFloatFrequencyData(...)

    0 讨论(0)
  • 2020-12-29 07:46

    cwilso has it backwards.

    the time data array is the longer one (fftSize), and the frequency data array is the shorter one (half that, frequencyBinCount).

    fftSize of 2048 at the usual sample rate of 44.1kHz means each sample has 1/44100 duration, you have 2048 samples at hand, and thus are covering a duration of 2048/44100 seconds, which 46 milliseconds, not 23 milliseconds. The frequencyBinCount is indeed 1024, but that refers to the frequency domain (as the name suggests), not the time domain, and it the computation 1024/44100, in this context, is about as meaningful as adding your birth date to the fftSize.

    A little math illustrating what's happening: Fourier transform is a 'vector space isomorphism', that is, a mapping going bijectively (i.e., reversible) between 2 vector spaces of the same dimension; the 'time domain' and the 'frequency domain.' The vector space dimension we have here (in both cases) is fftSize.

    So where does the 'half' come from? The frequency domain coefficients 'count double'. Either because they 'actually are' complex numbers, or because you have the 'sin' and the 'cos' flavor. Or, because you have a 'magnitude' and a 'phase', which you'll understand if you know how complex numbers work. (Those are 3 ways to say the same in a different jargon, so to speak.)

    I don't know why the API only gives us half of the relevant numbers when it comes to frequency - I can only guess. And my guess is that those are the 'magnitude' numbers, and the 'phase' numbers are thrown out. The reason that this is my guess is that in applications, magnitude is far more important than phase. Still, I'm quite surprised that the API throws out information, and I'd be glad if some expert who actually knows (and isn't guessing) can confirm that it's indeed the magnitude. Or - even better (I love to learn) - correct me.

    0 讨论(0)
  • 2020-12-29 07:57

    getByteTimeDomainData (and the newer getFloatTimeDomainData) return an array of the size you requested - its frequencyBinCount, which is calculated as half of the requested fftSize. That array is, of course, at the current sampleRate exposed on the AudioContext, so if it's the default 2048 fftSize, frequencyBinCount will be 1024, and if your device is running at 44.1kHz, that will equate to around 23ms of data.

    The byte values do range between 0-255, and yes, that maps to -1 to +1, so 128 is zero. (It's not volts, but full-range unitless values.)

    If you use getFloatFrequencyData, the values returned are in dB; if you use the Byte version, the values are mapped based on minDecibels/maxDecibels (see the minDecibels/maxDecibels description).

    0 讨论(0)
提交回复
热议问题