Image segmentation with maxflow

前端 未结 1 1240
伪装坚强ぢ
伪装坚强ぢ 2021-02-02 04:50

I have to do a foreground/background segmentation using maxflow algorithm in C++. (http://wiki.icub.org/iCub/contrib/dox/html/poeticon_2src_2objSeg_2src_2maxflow-v3_802_2maxflow

相关标签:
1条回答
  • 2021-02-02 05:29

    I recognize that source very well. That's the Boykov-Kolmogorov Graph Cuts library. What I would recommend you do first is read their paper.

    Graph Cuts is an interactive image segmentation algorithm. You mark pixels in your image on what you believe belong to the object (a.k.a. foreground) and what don't belong to the object (a.k.a the background). That's what you need first. Once you do this, the Graph Cuts algorithm best guesses what the labels of the other pixels are in the image. It basically goes through each of the other pixels that are not labeled and figures out whether or not they belong to foreground and background.

    The whole premise behind Graph Cuts is that image segmentation is akin to energy minimization. Image segmentation can be formulated as a cost function with a summation of two terms:

    1. Self-Penalty: This is the cost of assigning each pixel as either foreground or background. This is also known as a data cost.
    2. Neighbouring Penalties: This enforces that neighbouring pixels more or less should share the same classification label. This is also known as a smoothness cost.

    This kind of formulation is well known as the Maximum A Posteriori Markov Random Field classification problem (MAP-MRF). The goal is to minimize that cost function so that you achieve the best image segmentation possible. This is actually an NP-Hard problem, and is actually one of the problems that is up for money from the Clay Math Institute.

    Boykov and Kolmogorov theoretically proved that the MAP-MRF problem can be translated into graph theory, and solving the MAP-MRF problem is akin to taking your image and forming it into a graph with source and sink links, as well as links that connect neighbouring pixels together. To solve the MAP-MRF, you perform the maximum-flow/minimum-cut algorithm. There are many ways to do this, but Boykov / Kolmogorov find a more efficient way that is much faster than more established algorithms, such as Push-Relabel, Ford-Fulkenson, etc.

    The self penalties are what are known as t links, while the neighbouring penalties are what are known as n links. You should read up the paper to figure out how these are computed, but the t links describe the classification penalty. Basically, how much it would cost to classify each pixel as belonging to the foreground or the background. These are usually based on the negative log probability distributions of the image. What you do is you create a histogram of the distribution of what was classified as foreground and a histogram of what was classified as background.

    Usually, a uniform quanitization of each colour channel for both foreground and background suffices. You then turn these into PDFs but dividing by the total number of elements in each histogram, then when you calculate the t-links for each pixel, you access the colour, then see where it lies in the histogram, then take the negative log. This will tell you how much it will cost to classify that pixel to be either foreground or background.

    The neighbouring pixel costs are more intuitive. People usually just take the Euclidean distance between one pixel and a neighbouring pixel and apply this distance to a Gaussian. To make things simple, a 4 pixel neighbourhood is what is usually used (North, South, East and West).

    Once you figure out how to compute the cost, you follow this procedure:

    1. Mark pixels as foreground or background.
    2. Create a graph structure using their library
    3. Compute the histograms of the foreground and background pixels
    4. Calculate t-links and add to the graph
    5. Calculate n-links and add to the graph
    6. Invoke the maxflow routine on the graph to segment the image
    7. Go through each pixel and figure out whether or not the pixel belongs to foreground or background.
    8. Create a binary map that reflects this, then copy over image pixels where the binary map is true, and don't do this when it's false.

    The original source of maxflow can be found here: http://pub.ist.ac.at/~vnk/software/maxflow-v3.03.src.zip

    It also has a README so you can see how the library is supposed to work given some example images.

    You have a lot to digest, but Graph Cuts is one of the most powerful interactive segmentation tools out there.

    Good luck!

    0 讨论(0)
提交回复
热议问题