I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1k
convert the samples to floating point values ranging from -1.0 to +1.0, then:
out = (s1 + s2) - (s1 * s2);
Since your profile says you work in embedded systems, I will assume that floating point operations are not always an option.
> So what's the correct method to add these sounds together in my software mixer?
As you guessed, adding and clipping is the correct way to go if you do not want to lose volume on the sources. With samples that are int16_t
, you need to the sum to be int32_t
, then limit and convert back to int16_t
.
> Am I wrong and the correct method is to lower the volume of each by half?
Yes. Halving of volume is somewhat subjective, but what you can see here and there is that halving the volume (loudness) is a decrease of about 10 dB (dividing the power by 10, or the sample values by 3.16). But you mean obviously to lower the sample values by half. This is a 6 dB decrease, a noticeable reduction, but not quite as much as halving the volume (the loudness table there is very useful).
With this 6 dB reduction you will avoid all clipping. But what happens when you want more input channels? For four channels, you would need to divide the input values by 4, that is lowering by 12 dB, thus going to less that half the loudness for each channel.
> Do I need to add a compressor/limiter or some other processing stage to
get the volume and mixing effect I'm trying for?
You want to mix, not clip, and not lose loudness on the input signals. This is not possible, not without some kind of distortion.
As suggested by Mark Ransom, a solution to avoid clipping while not losing as much as 6 dB per channel is to hit somewhere in between "adding and clipping" and "averaging".
That is for two sources: adding, dividing by somewhere between 1 and 2 (reduce the range from [-65536, 65534] to something smaller), then limiting.
If you often clip with this solution and it sounds too harsh, then you might want to soften the limit knee with a compressor. This is a bit more complex, since you need to make the dividing factor dependent on the input power. Try the limiter alone first, and consider the compressor only if you are not happy with the result.
There is an article about mixing here. I'd be interested to know what others think about this.
I did it this way once: I used floats (samples between -1 and 1), and I initialized a "autoGain" variable with a value of 1. Then I would add all the samples together (could also be more than 2). Then I would multiply the outgoing signal with autoGain. If the absolute value of the sum of the signals before multiplication would be higher than 1, I would make assign 1/this sum value. This would effectively make autogain smaller than 1 let's say 0.7 and would be equivalent to some operator quickly turning down the main volume as soon as he sees that the overall sound is getting too loud. Then I would over an adjustable period of time add to the autogain until it finally would be back at "1" (our operator has recovered from shock and is slowly cranking up the volume :-)).
Most audio mixing applications will do their mixing with floating point numbers (32 bit is plenty good enough for mixing a small number of streams). Translate the 16 bit samples into floating point numbers with the range -1.0 to 1.0 representing full scale in the 16 bit world. Then sum the samples together - you now have plenty of headroom. Finally, if you end up with any samples whose value goes over full scale, you can either attenuate the whole signal or use hard limiting (clipping values to 1.0).
This will give much better sounding results than adding 16 bit samples together and letting them overflow. Here's a very simple code example showing how you might sum two 16 bit samples together:
short sample1 = ...;
short sample2 = ...;
float samplef1 = sample1 / 32768.0f;
float samplef2 = sample2 / 32768.0f;
float mixed = samplef1 + sample2f;
// reduce the volume a bit:
mixed *= 0.8;
// hard clipping
if (mixed > 1.0f) mixed = 1.0f;
if (mixed < -1.0f) mixed = -1.0f;
short outputSample = (short)(mixed * 32768.0f)
I did the following thing:
MAX_VAL = Full 8 or 16 or whatever value
dst_val = your base audio sample
src_val = sample to add to base
Res = (((MAX_VAL - dst_val) * src_val) / MAX_VAL) + dst_val
Multiply the left headroom of src by the MAX_VAL normalized destination value and add it. It will never clip, never be less loud and sound absolutely natural.
Example:
250.5882 = (((255 - 180) * 240) / 255) + 180
And this sounds good :)