Do you remember in old cellphones you could make a speech shortcut to call a person.
I am trying to make an app in android with that function. The user records a wor
You need to convert both reference sounds and recorded sound to features. For that you need to split sound on frames and extract FFT or directly mel-cepstrum. You can use any MFCC library out there for that.
After you get features, you can compare them with DTW algorithm. You can find some details here
http://en.wikipedia.org/wiki/Dynamic_time_warping
The DTW will return you the threshold which you can use to select the right person to call to.
Similar quesitons is
Simplest algorithm of measuring how similar of two short audio