发表新帖

发表新帖

how do I create my own training corpus for stanford tagger?

后端未结

关注

 4  1483

你的背包 2021-02-05 13:20

I have to analyze informal english text with lots of short hands and local lingo. Hence I was thinking of creating the model for the stanford tagger.

How do i create my

4条回答

遥遥无期 (楼主)

2021-02-05 13:54
Essentially, the texts that you format for the training process should have one token on each line, followed by a tab, followed by an identifier. The identifier may be something like "LOC" for location, "COR" for corporation, or "0" for non-entity tokens. E.g.
```
I     0
left     0
my     0
heart     0
in     0
Kansas     LOC
City     LOC
.     0
```
When our team trained a series of classifiers, we fed each a training file formatted like this with roughly 180,000 tokens, and we saw a net improvement in precision but a net decrease in recall. (It bears noting that the increase in precision was not statistically significant.) In case it might be useful to others, I described the process we used to train the classifier as well as the p, r, and f1 values of both trained and default classifiers here.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题