Algorithm to mix sound

后端未结

关注

 20  1987

I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1k

相关标签:

20条回答

没有蜡笔的小新

2020-11-29 17:23
convert the samples to floating point values ranging from -1.0 to +1.0, then:
```
out = (s1 + s2) - (s1 * s2);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-11-29 17:25
Since your profile says you work in embedded systems, I will assume that floating point operations are not always an option.
```
> So what's the correct method to add these sounds together in my software mixer?
```
As you guessed, adding and clipping is the correct way to go if you do not want to lose volume on the sources. With samples that are int16_t, you need to the sum to be int32_t, then limit and convert back to int16_t.
```
> Am I wrong and the correct method is to lower the volume of each by half?
```
Yes. Halving of volume is somewhat subjective, but what you can see here and there is that halving the volume (loudness) is a decrease of about 10 dB (dividing the power by 10, or the sample values by 3.16). But you mean obviously to lower the sample values by half. This is a 6 dB decrease, a noticeable reduction, but not quite as much as halving the volume (the loudness table there is very useful).

With this 6 dB reduction you will avoid all clipping. But what happens when you want more input channels? For four channels, you would need to divide the input values by 4, that is lowering by 12 dB, thus going to less that half the loudness for each channel.
```
> Do I need to add a compressor/limiter or some other processing stage to 
get the volume and mixing effect I'm trying for?
```
You want to mix, not clip, and not lose loudness on the input signals. This is not possible, not without some kind of distortion.

As suggested by Mark Ransom, a solution to avoid clipping while not losing as much as 6 dB per channel is to hit somewhere in between "adding and clipping" and "averaging".

That is for two sources: adding, dividing by somewhere between 1 and 2 (reduce the range from [-65536, 65534] to something smaller), then limiting.

If you often clip with this solution and it sounds too harsh, then you might want to soften the limit knee with a compressor. This is a bit more complex, since you need to make the dividing factor dependent on the input power. Try the limiter alone first, and consider the compressor only if you are not happy with the result.
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-11-29 17:26

There is an article about mixing here. I'd be interested to know what others think about this.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-11-29 17:26

I did it this way once: I used floats (samples between -1 and 1), and I initialized a "autoGain" variable with a value of 1. Then I would add all the samples together (could also be more than 2). Then I would multiply the outgoing signal with autoGain. If the absolute value of the sum of the signals before multiplication would be higher than 1, I would make assign 1/this sum value. This would effectively make autogain smaller than 1 let's say 0.7 and would be equivalent to some operator quickly turning down the main volume as soon as he sees that the overall sound is getting too loud. Then I would over an adjustable period of time add to the autogain until it finally would be back at "1" (our operator has recovered from shock and is slowly cranking up the volume :-)).

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2020-11-29 17:27
Most audio mixing applications will do their mixing with floating point numbers (32 bit is plenty good enough for mixing a small number of streams). Translate the 16 bit samples into floating point numbers with the range -1.0 to 1.0 representing full scale in the 16 bit world. Then sum the samples together - you now have plenty of headroom. Finally, if you end up with any samples whose value goes over full scale, you can either attenuate the whole signal or use hard limiting (clipping values to 1.0).

This will give much better sounding results than adding 16 bit samples together and letting them overflow. Here's a very simple code example showing how you might sum two 16 bit samples together:
```
short sample1 = ...;
short sample2 = ...;
float samplef1 = sample1 / 32768.0f;
float samplef2 = sample2 / 32768.0f;
float mixed = samplef1 + sample2f;
// reduce the volume a bit:
mixed *= 0.8;
// hard clipping
if (mixed > 1.0f) mixed = 1.0f;
if (mixed < -1.0f) mixed = -1.0f;
short outputSample = (short)(mixed * 32768.0f)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2020-11-29 17:30
I did the following thing:
```
MAX_VAL = Full 8 or 16 or whatever value
dst_val = your base audio sample
src_val = sample to add to base

Res = (((MAX_VAL - dst_val) * src_val) / MAX_VAL) + dst_val
```
Multiply the left headroom of src by the MAX_VAL normalized destination value and add it. It will never clip, never be less loud and sound absolutely natural.

Example:
```
250.5882 = (((255 - 180) * 240) / 255) + 180
```
And this sounds good :)
0 讨论(0)
发布评论:

提交评论
- 加载中...