Accurately detect color regions in an image using K-means clustering

I'm using K-means clustering in color-based image segmentation. I have a 2D image which has 3 colors, black, white, and green. Here is the image,

I want K-means to produce 3 clusters, one represents the green color region, the second one represents the white region, and the last one represents the black region.

Here is the code I used,

%Clustering color regions in an image. 

%Step 1: read the image using imread, and show it using imshow. 

img =  (imread('img.jpg'));

figure, imshow(img), title('X axis rock cut'); %figure is for creating a figure window.
text(size(img,2),size(img,1)+15,...
     'Unconventional shale x axis cut', ...
     'FontSize',7,'HorizontalAlignment','right');

 %Step 2: Convert Image from RGB Color Space to L*a*b* Color Space
 conversionform = makecform('srgb2lab'); %the form of the conversion is defined as from rgb to l a b
 lab_img = applycform(img,conversionform); %converting the rgb image to l a b image using the conversion form defined above.

 %Step 3: Classify the Colors in 'a*b*' Space Using K-Means Clustering
 ab = double(lab_img(:,:,2:3));
 nrows = size(ab,1);
 ncols = size(ab,2);
 ab = reshape(ab,nrows*ncols,2);

 nColors = 3;
% repeat the clustering 3 times to avoid local minima
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
                                      'Replicates',3);
%Step 4: Label Every Pixel in the Image Using the Results from KMEANS

%For every object in your input, kmeans returns an index corresponding to a cluster. The cluster_center output from kmeans will be used later in the example. Label every pixel in the image with its cluster_index.

pixel_labels = reshape(cluster_idx,nrows,ncols);
figure, imshow(pixel_labels,[]), title('image labeled by cluster index');

segmented_images = cell(1,3);
rgb_label = repmat(pixel_labels,[1 1 3]);

for k = 1:nColors
    color = img;
    color(rgb_label ~= k) = 0;
    segmented_images{k} = color;
end

figure, imshow(segmented_images{1}), title('objects in cluster 1');
figure, imshow(segmented_images{2}), title('objects in cluster 2');
figure, imshow(segmented_images{3}), title('objects in cluster 3');

But I'm not getting the results as required. I get one cluster with green regions, one cluster with green region boundaries, and one with gray, black, and white colors. Here are the resulting clusters.

The aim of doing this is that after getting the correct clustering results, I want to count the number of pixels in every region using the concept of connected components.

So, my aim is to know how many pixels there are in every color region. I tried another simpler way by getting the matrix of the 2D image and trying to figure out the number of pixels for every color. However, I found more than 3 RGB colors in the matrix, maybe because pixels of the same color have a slightly different color levels. That's why I went to image segmentation.

Can anyone please tell me how to fix the code above in order to get the required results?

I would also appreciate it if you give me hints on how to do this in an easier way, if there is any.

EDIT: Here is a code I made to iterate over every pixel in the image. Please notice I use 4 colors red, yellow, blue, and white instead of green, white, and black, but the idea is the same. rgb2name is the function that returns the color name given RGB color.

im= imread ('img.jpg'); 

[a b c] = size (im); 
%disp ([a b]);
yellow=0; 
blue=0; 
white=0; 
red=0; 


for i=1:a
    for j=1:b
        x= impixel(im, i, j)/255 ;
        color= rgb2name (x);
        if (~isempty (strfind (color, 'yellow')))
            yellow= yellow+1; 
        elseif (~isempty (strfind(color, 'red')))
            red= red+1; 
        elseif (~isempty (strfind (color, 'blue')))
            blue= blue+1; 
        elseif (~isempty (strfind (color, 'white')))
           white= white+1; 
        else 
            %disp ('warning'); break; 
        end            
        disp (color);
        disp (i);
    end
end
disp (yellow)
disp (red)
disp (blue)
disp (white)

Thank You.

This is my approach to count the number of pixels in every region. Given that (as discussed in the comments):

the value (RGB) and the number (K) of colors are known a priori
compression artifacts and anti-aliasing generated additional colors, that must be considered as the nearest-neighbor among the K know colors.

Since you know a priori the colors, you don't need k-means. It could actually lead to bad results as in your question. The approach of @crowdedComputeeer take care of this aspect.

You can compute nearest neighbor with pdist2 directly on the pixel values. There's no need to use the really slow function that looks for the color name.

Here is the code. You can change the number and values of colors simply modifying the variable colors. This will compute the number of pixels in each color, and output the masks.

img =  (imread('path_to_image'));

colors = [  0 0 0;    % black
            0 1 0;    % green
            1 1 1];   % white


% % You can change the colors        
% colors = [  0 0 1;    % red
%             1 1 0;    % yellow
%             1 0 0;    % blue
%             1 1 1];   % white


% Find nearest neighbour color
list = double(reshape(img, [], 3)) / 255;
[~, IDX] = pdist2(colors, list, 'euclidean', 'Smallest', 1);
% IDX contains the indices to the nearest element


N = zeros(size(colors, 1), 1);
for i = 1 : size(colors, 1)
    % Count the number of pixels for each color
    N(i) = sum( IDX == i );
end

% This will display the number of pixels for each color
disp(N);



% Eventually build the masks
indices = reshape(IDX, [size(img,1), size(img,2)]);

figure();
szc = size(colors,1);
for i = 1 : szc
    subplot(1,szc,i);
    imagesc(indices == i);
end

Resulting counts:

97554     % black
16894     % green
31852     % white

Resulting masks:

I thought this problem was very interesting, so I apologize ahead of time if the answer is a little overboard. In short, k-means is the right strategy, in general, for problems where you want to segment an image into a discrete color space. But, your example image, which contains primarily only three colors, each of which is well separated in color space, is easily segmented using only a histogram. See below for segmenting using thresholds.

You can easily get the pixel counts by summing each matrix. e.g., bCount = sum(blackPixels(:))

filename = '379NJ.png';
x = imread(filename); 
x = double(x); % cast to floating point
x = x/max(max(max(x))); % normalize

% take histogram of green dimension
g = x(:, :, 2);
c = hist(g(:), 2^8);

% smooth the hist count 
c = [zeros(1, 10), c, zeros(1, 10)];
N = 4;
for i = N+1:length(c) - N; 
   d(i - N) = mean(c(i -N:i)); 
end
d = circshift(d, [1, N/2]);

% as seen in histogram, the three colors fall nicely into 3 peaks
figure, plot(c, '.-');
[~, clusterCenters] = findpeaks(d, 'MinPeakHeight', 1e3);

% set the threshold halfway between peaks 
boundaries = [floor((clusterCenters(2) - clusterCenters(1))/2), ...
                 clusterCenters(2) + floor((clusterCenters(3) - clusterCenters(2))/2)];
thresh1 = boundaries(1)*ones(size(g))/255;
thresh2 = boundaries(2)*ones(size(g))/255;

% categorize based on threshold
blackPixels = g < thresh1;
greenPixels = g >= thresh1 & g < thresh2;
whitePixels = g >= thresh2;

Maybe this project could help, please take a try.

来源：https://stackoverflow.com/questions/32034344/accurately-detect-color-regions-in-an-image-using-k-means-clustering

标签

matlab

image-processing

k-means

image-segmentation