Understanding Freeman chain codes for OCR

前端 未结 4 1245
醉酒成梦
醉酒成梦 2021-01-30 17:34

Note that I\'m really looking for an answer to my question. I am not looking for a link to some source code or to some academic paper: I\'ve already used the s

相关标签:
4条回答
  • 2021-01-30 18:02

    What you need is a function d that measures the distance between chain codes. After then finding the letter to a given chain code is straightforward:

    Input:

    • normalized chain codes S for the set of possible letters (generally the cain codes for A-Z, a-z, 0-9, ...)
    • chain code x of a letter which needs to be detected and which could be slightly deformed (the chain code wouldn't match any chain code in the set S)

    The algorithm would iterate through the set of possible chain codes and calculate the distance d(x,si) for each element. The letter with the smallest distance would be the output of the algorithm (the identified letter).

    I would suggest following distance function: For two chain codes, add up the length differences of each direction: d(x,si) = |x0-si0| + |x1-si1| + .. + |x7-si7|. x0 is the number of 0s in the chain code x, si0 is the number of 0s in the chain code si, etc.

    An example will better explain what I'm thinking about. In the following image there are the letters 8, B and D, the fourth letter is a slightly deformed 8, which needs to be identified. The letters are written with Arial with font-size 8. The second line in the image is 10 times enlarged to better see the pixels.

    enter image description here

    I manually calculated (hopefully correct) the normalized chain codes which are:

    8:  0011223123344556756677
    B:  0000011222223344444666666666
    D:  00001112223334444666666666
    8': 000011222223344556756666 (deformed 8)
    

    The length differences (absolut) are:

    
    direction | length         | difference to 8'
              | 8 | B | D |  8'|   8 |  B |  D |
    ----------+---+---+---+----+-----+----+-----
            0 | 2 | 5 | 4 |  4 |   2 |  1 |  0 |
            1 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
            2 | 3 | 5 | 3 |  5 |   2 |  0 |  2 |
            3 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
            4 | 2 | 5 | 4 |  2 |   0 |  3 |  2 |
            5 | 3 | 0 | 0 |  3 |   0 |  3 |  3 |
            6 | 3 | 9 | 9 |  5 |   2 |  4 |  4 |
            7 | 3 | 0 | 0 |  1 |   2 |  1 |  1 |
    ----------+---+---+---+----+-----+----+-----
                            sum   10 | 12 | 14 |
    

    8' has the smallest distance to the chain code of 8, thus the algorithm would identify the letter 8. The distance to the letter B is not much bigger, but this is because the deformed 8 looks almost like the B.

    This method is not scaling invariant. I think there are two options to overcome this:

    • For different font sizes, having different sets of normalized chain codes
    • One set of normalized chain codes at a big size (e.g. 35x46 pixel) and scaling the input letter (which needs to be identified) to this bigger size.

    I'm not quite sure if the distance function is good enough for the set of all alphanumeric letters but I hope so. To minimize the error in identifying a letter you could include other features (not only chain codes) into the classification step. And again, you would need a distance measure -- this time for feature vectors.

    0 讨论(0)
  • 2021-01-30 18:09

    You could convert the chain code into an even simpler model that conveys the topology and then run machine learning code (which one would probably write in Prolog).

    But I wouldn't endorse it. People have done/tried this for years and we still have no good results.

    Instead of wasting your time with this non-linear/threshold based approach, why don't you just use a robust technique based on correlation? The easiest thing would be to convolve with templates.

    But I would develop Gabor wavelets on the letters and sort the coefficients into a vector space. Train a support vector machine with some examples and then use it as a classifier.

    This is pretty much how our brain does it and I'm sure its possible in the computer.

    Some random chit chat (ignore):

    I wouldn't use neuronal networks because I don't understand them and therefore don't like them. However, I'm always impressed by work of Geoff Hintons group http://www.youtube.com/watch?v=VdIURAu1-aU.

    Somehow he works on networks that can propagate information backward (deep learning). There is a talk of him where he lets a trained digit recognition network dream. That means he sets one of the output neurons to "2" and the network will generate pictures of things that it thinks are two on the input neurons.

    I found this very cool.

    0 讨论(0)
  • 2021-01-30 18:22

    As your question is not specific enough (whether you want the full algorithm based on the chain code or just some probabilistic classifying), I'll tell you what I know about the problem.

    Using the chain code, you can count some properties of the symbol, e.g. the number of rotations of the form 344445, 244445, 2555556, 344446 (arbitrary number of 4s), i.e. the "spikes" on the letter. Say there are 3 sections in the chain code that looks like this. So, this is almost certainly "W"! But this is a good case. You can count numbers of different kinds of rotations and compare that to previously saved values for every letter (which you do by hand). This is quite a good classifier, but alone is not sufficient, of course. It will be impossible for it to differentiate "D" and "O", "V" and "U". And much depends on your imagination.

    You should start by creating a test case of images of some letters with a reference and check your algorithm between the changes and inventing new criteria.

    Hope this answers your question at least partially.

    Update: One bright idea just came into my mind :) You can count the number of monotonic sequences in the chain, for example, for chain 000111222233334443333222444455544443333 (a quick dumb example, doesn't really correspond to any letter) we have
    00011122223333444 3333222444455544443333,
    00011122223333444 3333222 444455544443333,
    000111222233334443333222 4444555 44443333,
    0001112222333344433332224444555 44443333,

    i.e. four monotonic subsequences.

    This should be a good generalization, just count the number of this changes for real letters and compare to that acquired from the detected chain, this is a good try.

    Some problems and ideas:

    1. Chain is cyclic in a way, so you should deal with detecting monotony on the ends of the chain (to avoid off-by-one errors),
    2. Some artifacts should be accounted for, for example, if you know that letter is big enough (for example, 20 pixels in height), you would want to ignore monotony interruption shorter than 3 items, for example :)
    0 讨论(0)
  • 2021-01-30 18:24

    Last month, I was dealing with the same problem. Now, I have solved this problem by vetex chain code.

    The vetex chain code is the binary chain code. Then, I cut it to 5 parts. Obviously, The number 0-9 has its own charcter in different part.

    0 讨论(0)
提交回复
热议问题