There's a pretty cool algorithm called Bitstream Autocorrelation. It doesn't take too many CPU cycles, and it's very accurate. You basically find all the zero cross points, and then save it as a binary string. Then you use Auto-correlation on the string. It's fast because you can use XOR instead of floating point multiplication.