Recording voice while playing music - filter speakers input (iOS)

问题

I am developing an Karaoke app in which you can record your voice while listening to the music. When user uses headphones, everything is great - he can listen to the music and himself in headphones while singing. Then we have his pure voice recorded and we can mix it with playback. Problem occurs when user does not use headphones. Then we play music via speakers AVAudioSessionCategoryPlayAndRecord and record simultaneously. In final recording we have user's voice and playback from speakers mixed together. Problem is that playback's volume is very big and it's "covering" user's voice. Firstly I thought that this is normal behaviour because speakers are close to microphone so there is nothing I can do. However when I tried the same thing on Garage Band it somehow lowers playback from speakers making voice more hearable. I also tried it with Instagram (you can record while playing music e.g. from Spotify) and I noticed that after ~1 sec. playback's volume is decreasing and we can hear voice more precisely. I don't think that it's post processing because it would be very complicated so maybe there is an option to let "iOS handle it". To be clear - it does not lowers playback during recording - it's "done" while listening final video.

I use AVCaptureSession for recording and AudioKit Player for playing.

Thanks in advance for any thoughts/tips/advices!

Regards

回答1:

Ok so I asked Apple TS and the respond was exactly what I wanted: https://developer.apple.com/documentation/avfoundation/avaudiosession/mode/1616455-voicechat You just have to set this mode in AVAudioSession and system will handle it device’s tonal equalization is optimized for voice

回答2:

iOS cannot 'just handle' that, there is no "filter out the music" function. The fact that it doesn't do it live, but does so later or with a delay strongly implies they are doing some post processing. I'm not a machine learning expert, but I think if you just used an equalizer and a noise gate you could get this effect. It'd be hard to extract an acapella but you could certainly improve it. Likely Instagram takes that second to identify where the voice frequencies are so it knows how to EQ the signal.

来源：https://stackoverflow.com/questions/52442132/recording-voice-while-playing-music-filter-speakers-input-ios

标签

ios

avcapturesession

audiokit