Yesterday I finalised the code for detecting the audio energy of a track displayed over time, which I will eventually use as part of my audio thumbnailing project.
Howev
Determining pitch is a lot more complicated than just applying a DFT. It also depends on the nature of the source - different algorithms are appropriate for speech versus a musical instrument, for example. If this is a music track, as your question seems to imply, then you're probably out of luck as there is no obvious way of determining a single pitch value for multiple musical instruments playing together (how would you even define pitch in this context ?). Maybe you could be more specific about what you are trying to do - perhaps a power spectrum would be more useful than trying to determine an arbitrary pitch ?