training-data

How do I train tesseract 4 with image data instead of a font file?

被刻印的时光 ゝ 提交于 2019-12-20 10:46:20
问题 I'm trying to train Tesseract 4 with images instead of fonts. In the docs they are explaining only the approach with fonts, not with images. I know how it works, when I use a prior version of Tesseract but I didn't get how to use the box/tiff files to train with LSTM in Tesseract 4. I looked into tesstrain.sh, which is used to generate LSTM training data but couldn't find anything helpful. Any ideas? 来源: https://stackoverflow.com/questions/43352918/how-do-i-train-tesseract-4-with-image-data

About image backgrounds while preparing training dataset for cascaded classifier

江枫思渺然 提交于 2019-12-20 04:16:17
问题 I have a question about preparing the dataset of positive samples for a cascaded classifier that will be used for object detection. As positive samples, I have been given 3 sets of images: a set of colored images in full size (about 1200x600) with a white background and with the object displayed at a different angles in each image another set with the same images in grayscale and with a white background, scaled down to the detection window size (60x60) another set with the same images in

Why does my model predict the same label?

我怕爱的太早我们不能终老 提交于 2019-12-20 03:07:02
问题 I am training a small network and the training seems to go fine, the val loss decreases, I reach validation accuracy around 80, and it actually stops training once there is no more improvement (patience=10). It trained for 40 epochs. However, it keeps predicting only one class for every test image! I tried to initialize the conv layers randomly, I added regularizers, I switched from Adam to SGD, I added clipvalue, I added dropouts. I also switched to softmax (I have only two labels but I saw

Questions about creating stanford CoreNLP training models

狂风中的少年 提交于 2019-12-19 11:33:33
问题 I've been working with Stanford's coreNLP to perform sentiment analysis on some data I have and I'm working on creating a training model. I know we can create a training model with the following command: java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz I know what goes in the train.txt file. You score sentences and put them in train.txt, something like this: (0 (2 Today) (0 (0 (2 is) (0 (2 a) (0 (0 bad) (2 day)

Number of parameters must be always be even : opennlp

给你一囗甜甜゛ 提交于 2019-12-19 10:32:08
问题 I've been trying to use the command Line interface to train my model like this: opennlp TokenNameFinderTrainer -model en-ner-pincode.bin -iterations 500 \ -lang en -data en-ner-pincode.train -encoding UTF-8 the console output is: Number of parameters must be always be even Usage: opennlp TokenNameFinderTrainer[.evalita|.ad|.conll03|.bionlp2004|.conll02|.muc6|.ontonotes|.brat] [-factory factoryName] [-resources resourcesDir] [-type modelType] [-featuregen featuregenFile] [-nameTypes types] [

R: How to split a data frame into training, validation, and test sets?

有些话、适合烂在心里 提交于 2019-12-17 10:42:32
问题 I'm using R to do machine learning. Following standard machine learning methodology, I would like to randomly split my data into training, validation, and test data sets. How do I do that in R? I know there are some related questions on how to split into 2 data sets (e.g. this post), but it is not obvious how to do it for 3 split data sets. By the way, the correct approach is to use 3 data sets (including a validation set to tune your hyperparameters). 回答1: This linked approach for two groups

tensorflow: batches of variable-sized images

二次信任 提交于 2019-12-13 15:00:47
问题 When one passes to tf.train.batch, it looks like the shape of the element has to be strictly defined, else it would complain that All shapes must be fully defined if there exist Tensors with shape Dimension(None) . How, then, does one train on images of different sizes? 回答1: You could set dynamic_pad=True in the argument of tf.train.batch. dynamic_pad : Boolean. Allow variable dimensions in input shapes. The given dimensions are padded upon dequeue so that tensors within a batch have the same

Matlab predict function not working

狂风中的少年 提交于 2019-12-13 09:07:36
问题 I am trying to train a linear SVM on a data which has 100 dimensions. I have 80 instances for training. I train the SVM using fitcsvm function in MATLAB and check the function using predict on the training data. When I classify the training data with the SVM all the data points are being classified into only one class. SVM = fitcsvm(votes,b,'ClassNames',unique(b)'); predict(SVM,votes); This gives outputs as all 0's which corresponds to 0th class. b contains 1's and 0's indicating the class to

stanford corenlp sentiment training set

陌路散爱 提交于 2019-12-13 08:33:17
问题 I am new to the area of NLP and sentiment analysis in particular. My goal is to train the Stanford CoreNLP sentiment model. I am aware that the sentences provided as training data should be in the following format. (3 (2 (2 The) (2 Rock)) (4 (3 (2 is) (4 (2 destined) (2 (2 (2 (2 (2 to) (2 (2 be) (2 (2 the) (2 (2 21st) (2 (2 (2 Century) (2 's)) (2 (3 new) (2 (2 ``) (2 Conan)))))))) (2 '')) (2 and)) (3 (2 that) (3 (2 he) (3 (2 's) (3 (2 going) (3 (2 to) (4 (3 (2 make) (3 (3 (2 a) (3 splash)) (2

OpenCV Haartraining does not finish forever

萝らか妹 提交于 2019-12-13 03:58:59
问题 This is the first time I use haartraining of opencv. Just for practice, I used 35 positive images and 45 negative images. But when I try to train from data, It does not finish forever, Even when parameters are extremely adjusted. (min hit rate = 0.001, max false alarm rate = 0.999 I don't think it would take a lot of time because of this extreme values) What must be wrong in my experiment? Here is my command and parameters. $opencv_haartraining -data Training -vec samples.vec -bg negatives