Segmentation for connected characters

北城以北 提交于 2019-11-29 01:14:00

问题


How can I segment if the characters are connected? I just tried using watershed with distance transform (http://opencv-code.com/tutorials/count-and-segment-overlapping-objects-with-watershed-and-distance-transform/) to find the number of components but it seems that it does not perform well.

  1. It requires the object to be separated after a threshold in order to perform well.

Having said so, how can I segment the characters effectively? Need helps/ideas.

As attached is the example of binary image.

An example of heavily connected.

Ans:

@mmgp this is my o/p


回答1:


I believe there are two approaches here: 1) redo the binarization step that led to these images you have right now; 2) consider different possibilities based on image size. Let us focus on the second approach given the question.

In your smallest image, only two digits are connected, and that happens only when considering 8-connectivity. If you handle your image as 4-connected, then there is nothing to do because there are no two components connected that should be separated. This is shown below. The right image can be obtained simply by finding the points that are connected to another one only when considering 8-connectivity. In this case, there are only two such points, and by removing them we disconnect the two digits '1'.

   

In your other image this is no longer the case. And I don't have a simple method to apply on it that can be applied on the smaller image without making it worse. But, actually, we could consider upscaling both images to some common size, using interpolation by nearest neighbor so we don't move from the binary representation. By resizing both of your images so they width equal to 200, and keeping the aspect ratio, we can apply the following morphological method to both of them. First do a thinning:

Now, as can be seen, the morphological branch points are the ones connecting your digits (there is another one at the left-most digit 'six' too, which will be handled). We can extract these branch points and apply a morphological closing with a vertical line of 2*height+1 (height is from your image), so no matter where the point is, its closing will produce a full vertical line. Since your image is not so small anymore, this line doesn't need to be 1 point-wide, in fact I considered a line that is 6 points-wide. Since some of the branch points are horizontally close, this closing operation will join them in the same vertical line. If a branch point is not close to another, then performing an erosion will remove a vertical line. And, by doing this, we eliminate the branch point related to the digit six at left. After applying these steps, we obtain the following image at left. Subtracting the original image from it, we get the image at right.

   

If we apply these same steps to the '8011' image, we end with the exactly same image as we started with. But this is still good, because applying the simple method that remove points that are only connected in 8-connectivity, we obtain the separated components as before.




回答2:


It is common to use "smearing algorithms" for this. Also known as Run Length Smoothing Algorithm (RLSA). It is a method that segments black and white images into blocks. You can find some information here or look around on the internet to find an implementation of the algorithm.




回答3:


Not sure if I want to help you solve captchas, but one idea would be to use erosion. Depending on how many pixels you have to work with it might be able to sufficiently separate the characters without destroying them. This would likely be best used as a pre-processing step for some other segmentation algorithm.



来源:https://stackoverflow.com/questions/14211413/segmentation-for-connected-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!