I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1k
I cannot believe that nobody knows the correct answer. Everyone is close enough but still, a pure philosophy. The nearest, i.e. the best was: (s1 + s2) -(s1 * s2). It's excelent approach, especially for MCUs.
So, the algorithm goes:
factor = average(s1)
You assume that both signals are already OK, not overflowing the 32767.0s1 = (s1/max(s1))*factor
s2 = (s2/max(s2))*factor
output = ((s1+s2)/max(s1+s2))*factor
Note that after the step 1. you don't really need to turn back to integers, you may work with floats in -1.0 to 1.0 interval and apply the return to integers at the end with the previously chosen power factor. I hope I didn't mistake now, cause I'm in a hurry.
You can also buy yourself some headroom with an algorithm like y= 1.1x - 0.2x^3 for the curve, and with a cap on the top and bottom. I used this in Hexaphone when the player is playing multiple notes together (up to 6).
float waveshape_distort( float in ) {
if(in <= -1.25f) {
return -0.984375;
} else if(in >= 1.25f) {
return 0.984375;
} else {
return 1.1f * in - 0.2f * in * in * in;
}
}
It's not bullet-proof - but will let you get up to 1.25 level, and smoothes the clip to a nice curve. Produces harmonic distortion, which sounds better than clipping and may be desirable in some circumstances.
I think that, so long as the streams are uncorrelated, you shouldn't have too much to worry about, you should be able to get by with clipping. If you're really concerned about distortion at the clip points, a soft limiter would probably work OK.
// #include <algorithm>
// short ileft, nleft; ...
// short iright, nright; ...
// Mix
float hiL = ileft + nleft;
float hiR = iright + nright;
// Clipping
short left = std::max(-32768.0f, std::min(hiL, 32767.0f));
short right = std::max(-32768.0f, std::min(hiR, 32767.0f));
If you need to do this right, I would suggest looking at open source software mixer implementations, at least for the theory.
Some links:
Audacity
GStreamer
Actually you should probably be using a library.
You're right about adding them together. You could always scan the sum of the two files for peak points, and scale the entire file down if they hit some kind of threshold (or if the average of it and its surrounding spots hit a threshold)