I have 15 audio tapes, one of which I believe contains an old recording of my grandmother and myself talking. A quick attempt to find the right place didn\'t turn it up. I don
if you are familiar with java you could try to feed the audio files throu minim and calculate some FFT-spectrums. Silence could be detected by defining a minimum level for the amplitude of the samples (to rule out noise). To seperate speech from music the FFT spectrum of a time-window can be used. Speech uses some very distinct frequencybands called formants - especially for vovels - music is more evenly distributed among the frequency spectrum.
You propably won't get a 100% separation of the speech/music blocks but it should be good enought to tag the files and only listen to the interesting parts.
http://code.compartmental.net/tools/minim/
http://en.wikipedia.org/wiki/Formant