Understanding Freeman chain codes for OCR

前端 未结 4 1248
醉酒成梦
醉酒成梦 2021-01-30 17:34

Note that I\'m really looking for an answer to my question. I am not looking for a link to some source code or to some academic paper: I\'ve already used the s

4条回答
  •  南方客
    南方客 (楼主)
    2021-01-30 18:02

    What you need is a function d that measures the distance between chain codes. After then finding the letter to a given chain code is straightforward:

    Input:

    • normalized chain codes S for the set of possible letters (generally the cain codes for A-Z, a-z, 0-9, ...)
    • chain code x of a letter which needs to be detected and which could be slightly deformed (the chain code wouldn't match any chain code in the set S)

    The algorithm would iterate through the set of possible chain codes and calculate the distance d(x,si) for each element. The letter with the smallest distance would be the output of the algorithm (the identified letter).

    I would suggest following distance function: For two chain codes, add up the length differences of each direction: d(x,si) = |x0-si0| + |x1-si1| + .. + |x7-si7|. x0 is the number of 0s in the chain code x, si0 is the number of 0s in the chain code si, etc.

    An example will better explain what I'm thinking about. In the following image there are the letters 8, B and D, the fourth letter is a slightly deformed 8, which needs to be identified. The letters are written with Arial with font-size 8. The second line in the image is 10 times enlarged to better see the pixels.

    enter image description here

    I manually calculated (hopefully correct) the normalized chain codes which are:

    8:  0011223123344556756677
    B:  0000011222223344444666666666
    D:  00001112223334444666666666
    8': 000011222223344556756666 (deformed 8)
    

    The length differences (absolut) are:

    
    direction | length         | difference to 8'
              | 8 | B | D |  8'|   8 |  B |  D |
    ----------+---+---+---+----+-----+----+-----
            0 | 2 | 5 | 4 |  4 |   2 |  1 |  0 |
            1 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
            2 | 3 | 5 | 3 |  5 |   2 |  0 |  2 |
            3 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
            4 | 2 | 5 | 4 |  2 |   0 |  3 |  2 |
            5 | 3 | 0 | 0 |  3 |   0 |  3 |  3 |
            6 | 3 | 9 | 9 |  5 |   2 |  4 |  4 |
            7 | 3 | 0 | 0 |  1 |   2 |  1 |  1 |
    ----------+---+---+---+----+-----+----+-----
                            sum   10 | 12 | 14 |
    

    8' has the smallest distance to the chain code of 8, thus the algorithm would identify the letter 8. The distance to the letter B is not much bigger, but this is because the deformed 8 looks almost like the B.

    This method is not scaling invariant. I think there are two options to overcome this:

    • For different font sizes, having different sets of normalized chain codes
    • One set of normalized chain codes at a big size (e.g. 35x46 pixel) and scaling the input letter (which needs to be identified) to this bigger size.

    I'm not quite sure if the distance function is good enough for the set of all alphanumeric letters but I hope so. To minimize the error in identifying a letter you could include other features (not only chain codes) into the classification step. And again, you would need a distance measure -- this time for feature vectors.

提交回复
热议问题