faliure in reading training data: tagger.cpp (393) CRF++

依然范特西╮ 提交于 2019-12-12 02:13:54

问题


While I am running CRF++ on my training data (train.txt) I have got the follwoing error

C:\Users\2012\Desktop\CRF_Software_Package\CRF++-0.58>crf_learn template train.d
ata model
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2013 Taku Kudo, All rights reserved.

reading training data: tagger.cpp(393) [feature_index_->buildFeatures(this)]
0.00 s

My training data contains Unicode characters and the data is saved using Notepad (encoding= Unicode big indian)

I am not sure If the problem with the template or with the format of the training data. How can I check the format of the training data?


回答1:


I think this is because of your template file. Please check whether you have included the last column which is gold-standard as training features. The column index starts from 0. E.g if you have 6 column in your BIO file. The template should not have something like %x[0,5]




回答2:


The Problem is with the Template file check your features for incorrect "grammer" i.e U10:%x[-1,0]/% [0,0]

you realize that after the second % there is a missing 'x' the corrected line should look like the one below U10:%x[-1,0]/%x[0,0]




回答3:


I had the same issue, files are in UTF-8, and template file and training file are definitely in the correct format. The reason was that CRFPP expects at most 1024 columns in the input files. Would be great if it would output an appropriate error message in such a case.




回答4:


The problem is not with the Unicode encoding, but the template file.

Have a look at this similar Q: The failure in using CRF+0.58 train NE Model



来源:https://stackoverflow.com/questions/16886251/faliure-in-reading-training-data-tagger-cpp-393-crf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!