The problem
I have a collection of digital photos of a mountain in Japan. However the mountain is often obscured by clouds or fog.
What techniqu
I think you are working on too low a level. A quick pass through an edge detection filter partitioned the image set very distinctly into (1, 3) and (2, 4). Especially if these images come from a fixed camera viewpoint, finding a match against the prototypical shape in (1) would be relatively easy algorithmically. Even your case of (4) could give you a domain of partial matching which you could heuristically determine if there was enough mountain there to consider.
The answer depends on how specific the problem is. If it's the same mountain from the same POV, run and edge detection against a known good image, and use it as a baseline for convolving against edge-detected images from the corpus. If it's only the edge of the mountain that you're interested in, manually remove other features from the baseline.
A few specific recommendations, building upon what you've got already:
Convolve
function from PerlMagick (you seem already comfortable with Perl and ImageMagick) to convolve the kernel with a few images. On the resulting image you should see a sharp spike corresponding to the "correct" position of the kernel (coinciding with the mountain in the image).