I\'ve trying to mix together 2 16bit linear PCM audio streams and I can\'t seem to overcome the noise issues. I think they are coming from overflow when mixing samples together.
The best solution I have found is given by Viktor Toth. He provides a solution for 8-bit unsigned PCM, and changing that for 16-bit signed PCM, produces this:
int a = 111; // first sample (-32768..32767)
int b = 222; // second sample
int m; // mixed result will go here
// Make both samples unsigned (0..65535)
a += 32768;
b += 32768;
// Pick the equation
if ((a < 32768) || (b < 32768)) {
// Viktor's first equation when both sources are "quiet"
// (i.e. less than middle of the dynamic range)
m = a * b / 32768;
} else {
// Viktor's second equation when one or both sources are loud
m = 2 * (a + b) - (a * b) / 32768 - 65536;
}
// Output is unsigned (0..65536) so convert back to signed (-32768..32767)
if (m == 65536) m = 65535;
m -= 32768;
Using this algorithm means there is almost no need to clip the output as it is only one value short of being within range. Unlike straight averaging, the volume of one source is not reduced even when the other source is silent.
here's a descriptive implementation:
short int mix_sample(short int sample1, short int sample2) {
const int32_t result(static_cast<int32_t>(sample1) + static_cast<int32_t>(sample2));
typedef std::numeric_limits<short int> Range;
if (Range::max() < result)
return Range::max();
else if (Range::min() > result)
return Range::min();
else
return result;
}
to mix, it's just add and clip!
to avoid clipping artifacts, you will want to use saturation or a limiter. ideally, you will have a small int32_t
buffer with a small amount of lookahead. this will introduce latency.
more common than limiting everywhere, is to leave a few bits' worth of 'headroom' in your signal.
Since you are in time domain the frequency info is in the difference between successive samples, when you divide by two you damage that information. That's why adding and clipping works better. Clipping will of course add very high frequency noise which is probably filtered out.
There's a discussion here: https://dsp.stackexchange.com/questions/3581/algorithms-to-mix-audio-signals-without-clipping about why the A+B - A*B solution is not ideal. Hidden down in one of the comments on this discussion is the suggestion to sum the values and divide by the square root of the number of signals. And an additional check for clipping couldn't hurt. This seems like a reasonable (simple and fast) middle ground.
Here is what I did on my recent synthesizer project.
int* unfiltered = (int *)malloc(lengthOfLongPcmInShorts*4);
int i;
for(i = 0; i < lengthOfShortPcmInShorts; i++){
unfiltered[i] = shortPcm[i] + longPcm[i];
}
for(; i < lengthOfLongPcmInShorts; i++){
unfiltered[i] = longPcm[i];
}
int max = 0;
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
int val = unfiltered[i];
if(abs(val) > max)
max = val;
}
short int *newPcm = (short int *)malloc(lengthOfLongPcmInShorts*2);
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
newPcm[i] = (unfilted[i]/max) * MAX_SHRT;
}
I added all the PCM data into an integer array, so that I get all the data unfiltered.
After doing that I looked for the absolute max value in the integer array.
Finally, I took the integer array and put it into a short int array by taking each element dividing by that max value and then multiplying by the max short int value.
This way you get the minimum amount of 'headroom' needed to fit the data.
You might be able to do some statistics on the integer array and integrate some clipping, but for what I needed the minimum amount of headroom was good enough for me.
I think they should be functions mapping [MIN_SHORT, MAX_SHORT] -> [MIN_SHORT, MAX_SHORT]
and they are clearly not (besides first one), so overflows occurs.
If unwind's proposition won't work you can also try:
((long int)(sample1) + sample2) / 2