NLTK Context Free Grammar Genaration

后端 未结 4 1635
花落未央
花落未央 2021-02-06 01:01

I\'m working on a non-English parser with Unicode characters. For that, I decided to use NLTK.

But it requires a predefined context-free grammar as below:



        
4条回答
  •  时光说笑
    2021-02-06 01:41

    You can use NLTK RegexTagger that have regular expression capability of decide token. This is exactly you need need in your case. As token ending with 'ing' will be tagged as gerunds and token ending with 'ed' will be tagged with verb past. see the example below.

    patterns = [
        (r'.*ing$', 'VBG'), # gerunds
        (r'.*ed$', 'VBD'), # simple past
        (r'.*es$', 'VBZ'), # 3rd singular present
        (r'.*ould$', 'MD'), # modals
        (r'.*\'s$', 'NN$'), # possessive nouns
        (r'.*s$', 'NNS') # plural nouns
     ]
    

    Note that these are processed in order, and the first one that matches is applied. Now we can set up a tagger and use it to tag a sentence. After this step, it is correct about a fifth of the time.

    regexp_tagger = nltk.RegexpTagger(patterns)
    regexp_tagger.tag(your_sent)
    

    you can use Combining Taggers for using collectively multiple tagger in a sequence.

提交回复
热议问题