问题
While I am running CRF++ on my training data (train.txt) I have got the follwoing error
C:\Users\2012\Desktop\CRF_Software_Package\CRF++-0.58>crf_learn template train.d
ata model
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2013 Taku Kudo, All rights reserved.
reading training data: tagger.cpp(393) [feature_index_->buildFeatures(this)]
0.00 s
My training data contains Unicode characters and the data is saved using Notepad (encoding= Unicode big indian)
I am not sure If the problem with the template or with the format of the training data. How can I check the format of the training data?
回答1:
I think this is because of your template file. Please check whether you have included the last column which is gold-standard as training features. The column index starts from 0. E.g if you have 6 column in your BIO file. The template should not have something like %x[0,5]
回答2:
The Problem is with the Template file check your features for incorrect "grammer" i.e U10:%x[-1,0]/% [0,0]
you realize that after the second % there is a missing 'x' the corrected line should look like the one below U10:%x[-1,0]/%x[0,0]
回答3:
I had the same issue, files are in UTF-8, and template file and training file are definitely in the correct format. The reason was that CRFPP expects at most 1024 columns in the input files. Would be great if it would output an appropriate error message in such a case.
回答4:
The problem is not with the Unicode encoding, but the template file.
Have a look at this similar Q: The failure in using CRF+0.58 train NE Model
来源:https://stackoverflow.com/questions/16886251/faliure-in-reading-training-data-tagger-cpp-393-crf