The failure in using CRF+0.58 train NE Model

陌路散爱 提交于 2019-12-12 02:37:39

问题


when i use CRF++0.58 to model a NE and progarm have a problem:

"reading training data:tagger.cpp(399) [feature_index_->buildFeatures(this)] 0.00s"

  1. the develop environment:
    • red hat linux 6.5,gcc 5.0,CRF++0.58
  2. written feature template:
    • template
  3. dataset:
    • Boson_train.txt
    • Boson_test.txt
    • the first column is words ,the second column is pos,the third column is NER tagger
  4. the problem:
    • when i want to train the NER model, i type this sentences "crf_learn -f 3 -c 4.0 template Boson_train crf_model", and i got this notification, "reading training data:tagger.cpp(399) [feature_index_->buildFeatures(this)] 0.00s". I can't understand the C++ language, so i can't fix the problem.
  5. the method i tryed:
    • 1.change the encode type of dataset. I use notepad++ to change "utf-8 with no BOM" to "utf-8". It didn't work.
    • 2.change the delimiter from '\t' to ' '(space). It didn't work.
    • 3.And i think maybe the template was wrong.So i use the crf++0.58/example/seg/template for test. It worked. But this template is simple, so I use /example/JapaneseNE/template which is more similar with my feature template. It didn't work. Then, i check the JapaneseNE example It works well. So i got confused. Is there someone can help me.
  6. template

    • U00:%x[-2,0]
    • U01:%x[-1,0]
    • U02:%x[0,0]
    • U03:%x[1,0]
    • U04:%x[2,0]
    • U05:%x[-2,0]/%x[-1,0]/%x[0,0]
    • U06:%x[-1,0]/%x[0,0]/%x[1,0]
    • U07:%x[0,0]/%x[1,0]/%x[2,0]
    • U08:%x[-1,0]/%x[0,0]
    • U09:%x[0,0]/%x[1,0]

    • U10:%x[-2,1]/%x[0,1]

    • U11:%x[-2,1]/%x[1,1]
    • U11:%x[-1,1]/%x[0,1]
    • U12:%x[0,0]/%x[0,1]
    • U13:%x[0,1]/%x[1,1]
    • U14:%x[0,1]/%x[2,1]
    • U15:%x[-1,0]/%x[0,1]
    • U16:%x[-1,0]/%x[-1,1]
    • U17:%x[1,0]/%x[1,1]
    • U18:%x[1,0]/%x[1,1]
    • U19:%x[2,0]/%x[2,1]

    • U20:%x[-1,2]

    • U21:%x[-2,2]
    • U22:%x[0,1]/%x[-1,2]
    • U23:%x[0,1]/%x[-2,2]
    • U24:%x[0,0]/%x[-1,2]
    • U25:%x[0,0]/%x[-2,2]
    • U26:%x[-1,2]/%x[-2,2]/%x[0,1]
    • U27:%x[-2,2]/%x[0,1]/%x[1,1]
    • U28:%x[-1,1]/%x[-1,2]/%x[0,1]
    • U29:%x[-1,2]/%x[0,0]/%x[0,1]
  7. Boson_train
    • 浙江 ns B_product_name
    • 在线 b I_product_name
    • 杭州 ns I_product_name
    • 4 m B_time
    • 月 m I_time
    • 25 m I_time
    • 日 m I_time
    • 讯 ng Out
    • ( x Out
    • 记者 n Out
    • x Out
    • x B_person_name
    • 施宇翔 nr I_person_name
    • x Out
    • 通讯员 n B_person_name
    • x Out
    • 方英 nr B_person_name
    • ) x Out
    • 毒贩 n Out
    • 很 zg Out
    • “ x Out
    • 时髦 nr Out
    • ” x Out
    • , x Out
    • 用 p Out
    • 微信 vn B_product_name
    • 交易 n Out
    • 毒品 n Out
    • 。 x Out
    • 没 v Out
    • 料想 v Out
    • 警方 n B_person_name
    • 也 d Out

回答1:


You were debugging in the right direction. The issue is indeed with your template file.

Your training data has 3 columns (column 0:word, column 1:pos-tag and column 2:tag).

You cannot use the tag as feature, but your template file has reference to it (i.e, column 2) in many feature definitions (see, U20 to U29). Your training should work after removing/correcting these.

Hope this helps :)

You can also checkout these video tutorials for better understanding of Template Files and Training NER with CRF++ :

1) https://youtu.be/GJHeTvDkIaE

2) https://youtu.be/Ur5umC4BwN4



来源:https://stackoverflow.com/questions/43487195/the-failure-in-using-crf0-58-train-ne-model

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!