I\'m looking to create a base table of images and then compare any new images against that to determine if the new image is an exact (or close) duplicate of the base.
<Picking 100 random points could mean that similar (or occasionally even dissimilar) images would be marked as the same, which I assume is not what you want. MD5 hashes wouldn't work if the images were different formats (png, jpeg, etc), had different sizes, or had different metadata. Reducing all images to a smaller size is a good bet, doing a pixel-for- pixel comparison shouldn't take too long as long as you're using a good image library / fast language, and the size is small enough.
You could try making them tiny, then if they are the same perform another comparison on a larger size - could be a good combination of speed and accuracy...
I have an idea, which can work and it most likely to be very fast. You can sub-sample an image to say 80x60 resolution or comparable, and convert it to grey scale (after subsampling it will be faster). Process both images you want to compare. Then run normalised sum of squared differences between two images (the query image and each from the db), or even better Normalised Cross Correlation, which gives response closer to 1, if both images are similar. Then if images are similar you can proceed to more sophisticated techniques to verify that it is the same images. Obviously this algorithm is linear in terms of number of images in your database so even though it is going to be very fast up to 10000 images per second on the modern hardware. If you need invariance to rotation, then a dominant gradient can be computed for this small image, and then the whole coordinate system can be rotated to canonical orientation, this though, will be slower. And no, there is no invariance to scale here.
If you want something more general or using big databases (million of images), then you need to look into image retrieval theory (loads of papers appeared in the last 5 years). There are some pointers in other answers. But It might be overkill, and the suggest histogram approach will do the job. Though I would think combination of many different fast approaches will be even better.
As cartman pointed out, you can use any kind of hash value for finding exact duplicates.
One starting point for finding close images could be here. This is a tool used by CG companies to check if revamped images are still showing essentially the same scene.