I\'ve created an iPhone app that can scan an image of a page of graph paper and can then tell me which squares have been blacked out and which squares are blank.
I do th
To start with, this problem reminded me a bit of these demo's that might be useful to learn from:
Personally, I think the most simple approach would be to detect the squares in your image.
1) Remove the background and small cruft
f_makebw = @(I) im2bw(I.data, double(median(I.data(:)))/1.3);
bw = ~blockproc(im, [128 128], f_makebw);
bw = bwareaopen(bw, 30);
2) Remove everything but the squares and circles.
se = strel('disk', 5);
bw = imerode(bw, se);
% Detect the squares and cricles via morphology
[B, L] = bwboundaries(bw, 'noholes');
3) Detect the squares using 'extend' from regionprops
. The 'Extent' metric measures what proportion of the bounding-box is filled. This makes it a
nice measure to distinguish between circles and squares
stats = regionprops(L, 'Extent');
extent = [stats.Extent];
idx1 = find(extent > 0.8);
bw = ismember(L, idx1);
4) This leaves you with your features, to synchronize or rectify the image with. An easy, and robust way, to do this, is via the Autocorrelation Function.
This gives nice peaks, which are easily detected. These peaks can be matched against the ACF peaks from a template image via the Hungarian algorithm. Once matched, you can correct rotation and scaling as you now have a linear system which you can solve:
x = Ax'
Translation can then be corrected using run-of-the-mill cross correlation against the same pre defined template.
If all goes well, you know have an aligned or synchronized image, which should help considerably in determining the position of the dots.
I've been starting to do something similar using my GPUImage iOS framework, so that might be an alternative to doing all of this in OpenCV or something else. As it's name indicates, GPUImage is entirely GPU-based, so it can have some tremendous performance benefits over CPU-bound processing (up to 180X faster for doing things like processing live video).
As a first stage, I took your images and ran them through a simple luminance thresholding filter with a threshold of 0.5 and arrived at the following for your two images:
I just added an adaptive thresholding filter, which attempts to correct for local illumination variances, and works really well for picking out text. However, in your images it uses too small of an averaging radius to handle your blobs well:
and seems to bring out your grid lines, which it sounds like you wish to ignore.
Maurits provides a more comprehensive description of what you could do, but there might be a way to implement these processing operations as high-performance GPU-based filters instead of relying on slower OpenCV versions of the same calculations. If you could grab rotation and scaling information from this thresholded image, you could construct a transform that could also be applied as a filter to your thresholded image to produce your final aligned image, which could then be downsampled and read out by your application to determine which grid locations were filled in.
These GPU-based thresholding operations run in less than 2 ms for 640x480 frames on an iPhone 4, so it might be possible to chain filters together to analyze incoming video frames as fast as the device's video camera can provide them.