I want to make a program that detects the note that is being played in front of the microphone. I am testing the FFT function of Naudio, but with the tests that I did in audacit
The highest peak in an audio spectrum is not necessarily the musical pitch as a human would perceive it, especially in a sound with strong overtones. That's because pitch is a human psycho-perceptual phenomena, the brain will often deduce frequencies that aren't even present in a waveform.
Auto-correlation methods of frequency or pitch estimation (roughly, finding how far apart even a funny-looking and/or non-sinusoidal waveform repeats in time) is usually a better match for what a human would call pitch. The reason for various enhancements to the autocorrelation algorithm is that simple autocorrelation will find an near infinite number of repeating wavelengths (e.g. if it repeats every 1 second it also repeats twice every 2 seconds, etc.) So the trick is to weight the correlation to somehow statistically better match what a human would guess about the same waveform.