Optimized OCR black/white pixel algorithm

前端未结

关注

 7  1909

被撕碎了的回忆 2021-02-06 06:30

I am writing a simple OCR solution for a finite set of characters. That is, I know the exact way all 26 letters in the alphabet will look like. I am using C# and am able to easi

7条回答

爱一瞬间的悲伤 (楼主)

2021-02-06 07:08
I am going down a similar track trying to invent an algorithm that will give me a minimal number of tests I can use to match an image to one I've seen previously. My application is OCR but in a limited domain of recognising an image from a fixed set of images as fast as possible.

My basic assumption (which I think is the same as yours, or was the same) is that if we can identify one unique pixel (where a pixel is defined as a point within an image plus a color) then we have found the perfect (fastest) test for that image. In your case you want to find letters.

If we cannot find one such pixel then we (grudgingly) look for two pixels that in combination are unique. Or three. And so on, until we have a minimal test for each of the images.

I should note that I have a strong feeling that in my particular domain I will be able to find such unique pixels. It might not be the same for your application where you seem to have a lot of "overlap".

After considering comments in this other question (where I'm just starting to get a feel for the problem) and comments here I think I might have come up with a workable algorithm.

Here is what I've got so far. The method I describe below is written in the abstract but in my application each "test" is a pixel identified by a point plus a color, and a "result" represents the identity of an image. Identification of these images is my end goal.

Consider the following tests numbered T1 to T4.
- T1: A B C
- T2: B
- T3: A C D
- T4: A D
This list of tests can be interpreted as follows;
- If test T1 is true we conclude that we have a result of A or B or C.
- If test T2 is true we conclude that we have a result of B.
- If test T3 is true we conclude that we have a result of A or C or D.
- If test T4 is true we conclude that we have a result of A or D.
For each individual result A, B, C, D, we want to find a combination of tests (ideally just one test) that will allow us to test for an unambiguous result.

Applying intuition and with a bit of squinting at the screen we can fumble our way to the following arrangement of tests.

For A we can test for a combination of T4 (either A or D) AND T1 (A but not D)

B is easy since there is a test T2 that gives result B and nothing else.

C is a bit harder, but eventually we can see that a combination of T3 (A or C or D) and NOT T4 (not A and not D) gives the desired result.

And similarly, D can be found with a combination of T4 and (not T1).

In summary
```
A <- T4 && T1
B <- T2
C <- T3 && ¬T4
D <- T4 && ¬T1
```
(where <- should be read as 'can be found if the following tests evaluate to true')

Intuition and squinting is fine, but we probably won't get these techniques built into the language until at least C# 5.0, so here is an attempt at formalising the method for implementation in lesser languages.

To find a result R,
1. Find the test Tr that gives the desired result R and the fewest unwanted results (ideally no others)
2. If the test gives the result R and nothing else we are finished. We can match for R where Tr is true.
3. For every unwanted result X in the test Tr;
  - (a) Find the shortest test Tn that gives R but not X. If we find such a test we can then match for R where (T && Tn)
  - (b) If no test matches condition (a) then find the shortest test Tx that includes X but does not include R. (Such a test would eliminate X as a result from test Tr). We can then test for R where (T && ¬Tx)
Now I will try to follow these rules for each of the desired results, A, B, C, D.

Here are the tests again for reference;
- T1: A B C
- T2: B
- T3: A C D
- T4: A D
For A

According to rule (1) we start with T4 since it is the simplest test that gives result A. But it also gives result 'D' which is an unwanted result. According to rule (3) we can use test T1 since it includes 'A' but does not include 'D'.

Therefore we can test for A with
```
A <- T4 && T1
```
For B

To find 'B' we quickly find test T2 which is the shortest test for 'B' and since it gives only result 'B' we are finished.
```
B <- T2
```
For C

To find 'C' we start with T1 and T3. Since the results of these tests are equally short we arbitrarily choose T1 as the starting point.

Now according to (3a) we need to find a test that includes 'C' but not 'A'. Since no test satisfies this condition we cannot use T1 as the first test. T3 has the same problem.

Being unable to find a test that satisfies (3a) we now look for a test that satisfies condition (3b). We look for a test that gives 'A' but not 'C'. We can see that test T4 satisfies this condition, so therefore we can test for C with
```
C <- T1 && ¬T4
```
For D

To find D we start with T4. T4 includes unwanted result A. There are no other tests that give the result D but not A so we look for a test that gives A but not D. Test T1 satisfies this condition so therefore we can test for D with
```
D <= T4 && ¬T1
```
These results are good but I don't think I've quite debugged this algorithm enough to have 100% confidence. I'm going to think about it a bit more and maybe code up some tests to see how it holds up. Unfortunately the algorithm is just complex enough that it will take more than a few minutes to implement carefully. It might be days before I conclude anything further.

Update

I found that it is optimal to simultaneously look for tests that satisfy (a) OR (b) rather than look for (a) and then (b). If we look first for (a) we might get a long list of tests when we might have got a shorter list by allowing some (b) tests.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

Optimized OCR black/white pixel algorithm

For A

For B

For C

For D