发表新帖

发表新帖

NLTK Context Free Grammar Genaration

后端未结

关注

 4  1651

花落未央 2021-02-06 01:01

I\'m working on a non-English parser with Unicode characters. For that, I decided to use NLTK.

But it requires a predefined context-free grammar as below:

4条回答

时光说笑 (楼主)

2021-02-06 01:41
You can use NLTK RegexTagger that have regular expression capability of decide token. This is exactly you need need in your case. As token ending with 'ing' will be tagged as gerunds and token ending with 'ed' will be tagged with verb past. see the example below.
```
patterns = [
    (r'.*ing$', 'VBG'), # gerunds
    (r'.*ed$', 'VBD'), # simple past
    (r'.*es$', 'VBZ'), # 3rd singular present
    (r'.*ould$', 'MD'), # modals
    (r'.*\'s$', 'NN$'), # possessive nouns
    (r'.*s$', 'NNS') # plural nouns
 ]
```
Note that these are processed in order, and the first one that matches is applied. Now we can set up a tagger and use it to tag a sentence. After this step, it is correct about a fifth of the time.
```
regexp_tagger = nltk.RegexpTagger(patterns)
regexp_tagger.tag(your_sent)
```
you can use Combining Taggers for using collectively multiple tagger in a sequence.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题