发表新帖

发表新帖

NLTK Context Free Grammar Genaration

后端未结

关注

 4  1648

花落未央 2021-02-06 01:01

I\'m working on a non-English parser with Unicode characters. For that, I decided to use NLTK.

But it requires a predefined context-free grammar as below:

4条回答

执笔经年 (楼主)

2021-02-06 01:29
If you are creating a parser, then you have to add a step of pos-tagging before the actual parsing -- there is no way to successfully determine the POS-tag of a word out of context. For example, 'closed' can be an adjective or a verb; a POS-tagger will find out the correct tag for you from the context of the word. Then you can use the output of the POS-tagger to create your CFG.

You can use one of the many existing POS-taggers. In NLTK, you can simply do something like:
```
import nltk
input_sentence = "Dogs chase cats"
text = nltk.word_tokenize(input_sentence)
list_of_tokens = nltk.pos_tag(text)
print list_of_tokens
```
The output will be:
```
[('Dogs', 'NN'), ('chase', 'VB'), ('cats', 'NN')]
```
which you can use to create a grammar string and feed it to nltk.parse_cfg().
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题