how to represent gazetteers or dictionaries as features in crf++?

不羁岁月 提交于 2019-12-10 09:24:23

问题


how to use gazetteers or dictionaries as features in CRF++?

To elaborate: suppose I want to do NER on person names, and I am having a gazetteer (or dictionary) containing commonly seen person names, I want to use this gazetteer as an input to crf++, how can I do that?

I am using the conditional random field package crf++ to perform named entity recognition tasks. I know how to represent some commonly used features in crf++. For example, if we want to use Capitalization as a feature, we can add one separate column in the feature template of crf indicating if a word is capitalized or not.


回答1:


You could make a new feature that indicates if a token is in the dictionary/gazeteer. Just check for set membership and set the Gazeteer feature to 1 or 0.



来源:https://stackoverflow.com/questions/33195322/how-to-represent-gazetteers-or-dictionaries-as-features-in-crf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!