Algorithm to mix sound

后端 未结 20 1985
囚心锁ツ
囚心锁ツ 2020-11-29 16:55

I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1k

相关标签:
20条回答
  • 2020-11-29 17:06

    I cannot believe that nobody knows the correct answer. Everyone is close enough but still, a pure philosophy. The nearest, i.e. the best was: (s1 + s2) -(s1 * s2). It's excelent approach, especially for MCUs.

    So, the algorithm goes:

    1. Find out the volume in which you want the output sound to be. It can be the average or maxima of one of the signals.
      factor = average(s1) You assume that both signals are already OK, not overflowing the 32767.0
    2. Normalize both signals with this factor:
      s1 = (s1/max(s1))*factor
      s2 = (s2/max(s2))*factor
    3. Add them together and normalize the result with the same factor
      output = ((s1+s2)/max(s1+s2))*factor

    Note that after the step 1. you don't really need to turn back to integers, you may work with floats in -1.0 to 1.0 interval and apply the return to integers at the end with the previously chosen power factor. I hope I didn't mistake now, cause I'm in a hurry.

    0 讨论(0)
  • 2020-11-29 17:10

    You can also buy yourself some headroom with an algorithm like y= 1.1x - 0.2x^3 for the curve, and with a cap on the top and bottom. I used this in Hexaphone when the player is playing multiple notes together (up to 6).

    float waveshape_distort( float in ) {
      if(in <= -1.25f) {
        return -0.984375;
      } else if(in >= 1.25f) {
        return 0.984375;
      } else {    
        return 1.1f * in - 0.2f * in * in * in;
      }
    }
    

    It's not bullet-proof - but will let you get up to 1.25 level, and smoothes the clip to a nice curve. Produces harmonic distortion, which sounds better than clipping and may be desirable in some circumstances.

    0 讨论(0)
  • 2020-11-29 17:11

    I think that, so long as the streams are uncorrelated, you shouldn't have too much to worry about, you should be able to get by with clipping. If you're really concerned about distortion at the clip points, a soft limiter would probably work OK.

    0 讨论(0)
  • 2020-11-29 17:11
    // #include <algorithm>
    // short ileft, nleft; ...
    // short iright, nright; ...
    
    // Mix
    float hiL = ileft + nleft;
    float hiR = iright + nright;
    
    // Clipping
    short left = std::max(-32768.0f, std::min(hiL, 32767.0f));
    short right = std::max(-32768.0f, std::min(hiR, 32767.0f));
    
    0 讨论(0)
  • 2020-11-29 17:12

    If you need to do this right, I would suggest looking at open source software mixer implementations, at least for the theory.

    Some links:

    Audacity

    GStreamer

    Actually you should probably be using a library.

    0 讨论(0)
  • 2020-11-29 17:15

    You're right about adding them together. You could always scan the sum of the two files for peak points, and scale the entire file down if they hit some kind of threshold (or if the average of it and its surrounding spots hit a threshold)

    0 讨论(0)
提交回复
热议问题