how do I create my own training corpus for stanford tagger?

后端 未结 4 1478
你的背包
你的背包 2021-02-05 13:20

I have to analyze informal english text with lots of short hands and local lingo. Hence I was thinking of creating the model for the stanford tagger.

How do i create my

4条回答
  •  遥遥无期
    2021-02-05 13:54

    Essentially, the texts that you format for the training process should have one token on each line, followed by a tab, followed by an identifier. The identifier may be something like "LOC" for location, "COR" for corporation, or "0" for non-entity tokens. E.g.

    I     0
    left     0
    my     0
    heart     0
    in     0
    Kansas     LOC
    City     LOC
    .     0
    

    When our team trained a series of classifiers, we fed each a training file formatted like this with roughly 180,000 tokens, and we saw a net improvement in precision but a net decrease in recall. (It bears noting that the increase in precision was not statistically significant.) In case it might be useful to others, I described the process we used to train the classifier as well as the p, r, and f1 values of both trained and default classifiers here.

提交回复
热议问题