问题
I'm doing simple recognition of letters and digits with neural networks. Up to now I used every pixel of letter's image as the input to the network. Needless to say this approach produces networks which are very large. So I'd like to extract features from my images and use them as inputs to NNs. My first question is what properties of the letters are good for recognizing them. Second question is how represent these features as inputs to neural networks. For example, I may have detected all corners in the letters and have them as a vector of (x,y) points. How to transform this vector into something suitable for an NN (as the vector sizes may be different for different letters).
回答1:
Lots of people have taken varieties of features for OCR. Simplest of which is of course, passing the pixel values directly.
There is a letter recognition data in OpenCV samples, extracted from UCI data set. It employs about 16 various features. Check this SOF : How to create data fom image like "Letter Image Recognition Dataset" from UCI
You can also see the paper explaining this in one of its answer. You can get it by googling.
Also you might be interested in this PPT. It gives a concise explanation of different feature extraction techniques used nowadays.
回答2:
This article, Introduction to Artificial Intelligence. OCR using Artificial Neural Networks by Kluever (2008) gives a survey of 4 features extraction techniques for OCR using neural networks. He describes the following methods:
- Run Length Encoding (RLE): You need a binary image for this (i.e., only white or black). The binary string can be encoded into a smaller representation.
- Edge detection: Find the edges. You can be quite coarse with this, so instead of returning the exact (x,y) coordinates you can reduce the matrix by only counting if such an edge occurs on reduced locations (i.e., on 20%, 40%, 60& and 80% of the image).
- Count 'True Pixels': This reduces the dimensionality from
width * height
of the image matrix towidth + height
. You use thewidth
vector andheight
vector as separate input. - Basic matrix input: You already tried this; Inputting the whole matrix gives good results, but as you noticed can result in high dimensionality and training times. You can experiment with reducing the size of your images (e.g., from 200x200 to 50x50).
回答3:
If you have a very high dimensional input vector, then I suggest you apply principal component analysis (PCA) to remove redundant features and reduce the the dimensionality of the feature vector.
来源:https://stackoverflow.com/questions/11427411/feature-extraction-from-neural-networks