Update this question was previously titled as \"Give me the name of a simple algorithm for signal(sound) pattern detection\"
Admittedly this is not my area of expertise but my first thought is a recursive least squares filter - it performs autocorrelation. It's similar to the convolution filter you're using now but a bit more advanced. Kalman filtering is an extension of this - it's used to regenerate a signal from multiple noisy measurements so it's probably not useful in this case. I would not reject offhand neural networks - they're very useful at this sort of thing (provided you train them properly).
Thinking about this more in depth I would probably recommend using an FFT. Chances are the signal you're looking for is very band-limited, and you'd probably have more luck using a bandpass filter on the data then an FFT and finally using your simple convolution filter on that data instead of the time-domain data points. Or do both and have twice the data. I'm not heavy into math so I cant' tell you if you'll get significant (not linearly-dependent) results using this method but the only thing you're losing is time.
You can try a Matched Filter. Although I've never actually used one, I've heard good things.
Also, although not simple, I think a Hidden Markov Model (HMM, I know you said no speech recognition, but hear me out!) would provide the best results for you. Again, I've never actually used one but there are open source implementations available all over the place. You would just need to train it using your exisiting "clean" insect recording. Here is one open source implementation: General Hidden Markov Model Library.
Google: FastICA algorithm. Some use ICA and Blind-Source Signal Separation Interchangeably. The author of the algorithm wrote a fantastic book on ICA that is around $40-$60 used on amazon.
Sound like a typical one class classification problem i.e. you want to search one thing in a large pool of other things you don't care about.
What you want to do is find a set of features or descriptors that you can calculate for every short piece of your raw recording that you can then match against the features your clean recording produces. I don't think convolution is neccessarily bad, though it is rather sensitive to noise so it might not be optimal for your case. What might actually work in your case is pattern matching on a binned fourier transform. You take the fourier transform of your signal, giving you a power vs frequency graph (rather than a power vs time graph) then you divide the frequency in bands and you take the average power for each band as a feature. If your data contains mostly white noise the patern you get from a raw insect sound of similar length will very closely match the pattern of your reference sound. This last trick has been used succesfully (with some windowing) to crack audio captcha's as used by google et al to make their sites accessible to the blind.
By the way, because your raw audio signal is digital (otherwise processing with a computer will not work ;-)) convolution is appropriate. You should perform the convolution between your reference signal and a sample of equal length from the raw input starting from each sample. So, if your reference signal has length N, and your raw sample has length M where M>=N then you should perform M-N+1=P convolutions between your reference signal and P samples from your raw input starting at 1..P. The best possibility for the location of the reference sound in the raw sample is the sample with the highest convolution score. Note that this becomes insanely time consuming very quickly.
Fourier transform based matching as I explained above using 50% overlapping samples from your raw data of twice the length of your reference sample would at least be faster (though not neccessarily better)
You may want a Wiener filter approach.