how to use gazetteers or dictionaries as features in CRF++?
To elaborate: suppose I want to do NER on person names, and I am having a gazetteer (or dictionary) containing commonly seen person names, I want to use this gazetteer as an input to crf++, how can I do that?
I am using the conditional random field package crf++ to perform named entity recognition tasks. I know how to represent some commonly used features in crf++. For example, if we want to use Capitalization as a feature, we can add one separate column in the feature template of crf indicating if a word is capitalized or not.
You could make a new feature that indicates if a token is in the dictionary/gazeteer. Just check for set membership and set the Gazeteer feature to 1 or 0.
来源:https://stackoverflow.com/questions/33195322/how-to-represent-gazetteers-or-dictionaries-as-features-in-crf