I\'m an experienced software engineer with some minor college DSP knowledge. I\'m working on a smartphone application to process signal data, such as from the microphone (sample
To put it simply: calculating FFT(x) does not introduce aliasing.
Aliasing is introduced every time a signal is sampled. I think the root of your confusion is that there are two sampling processes for the audio signal: once to take continuous sound and make it a 44.1 kHz signal, and then again in the downsampling step you want to add.
Say there was a spurious tone at 30 kHz (for instance): it must be rejected by the smartphone's hardware. Once you have those 44.1 kHz samples, you're stuck with whatever alias products got through the sampler. You cannot undo aliasing after sampling (this is not strictly true, but it's true for baseband signals, which is what you are dealing with). You should go ahead and assume that the phone designers got this right, and you won't have to worry about alias products from signal content higher than ~20 kHz.
Which brings us to the second sampling step. You are quite correct that you need to apply another anti-alias filter before you downsample. Any signal content below 20 kHz but above 2x your downsampled rate will alias into the output unless you attenuate it first. The key is that you are calculating FFT(x) BEFORE downsampling, then applying the filter, then downsampling. This is what allows you to get alias-protected outputs.
Most likely the smartphone has a delta-sigma ADC, which uses a relatively mellow analog anti-alias filter, either 1 or 2 pole, then samples at an extremely high rate (64 * 44.1 kHz or higher) then applies digital filters in its downsampling process. MEMS accelerometers similarly have intrinsic anti-alias protection. If you want to test this, use a sine wave source hooked up to an electrodynamic shaker (or a beefy subwoofer cone) and shake your phone at a few kHz. You should see no output on the accelerometer signal. Then drive a tweeter at 30 kHz and see if the mic shows anything.