I\'m writing an android app that lets user record his voice through microphone & save it in storage & link it to a specific content (like a Contact). Later, user cal
If you are aiming to compare an old recording of a user with a new call as it comes in, audio fingerprinting solutions like Dejavu in Python on a server or Echoprint in C++ won't help you. They are for doing recognition and retrieval on recorded audio segments plus noise. They cannot deal with the variabilites in human voice. See an explanation here.
If that's the case, what you are referring to is speaker recognition, which is much harder and involves quite a bit of machine learning. It would be tough to do this for a large corpus of users (especially offline on a phone), but for determining between a couple users, it might be doable.