How to get common tag pattern for sentences list in python with NLTK

杀马特。学长 韩版系。学妹 提交于 2019-12-24 12:25:51

问题


Here I have a list of sentences.With NLTK I can tag the sentence and get the tag pattern of that sentences. So as like this I can get the tag patterns for the whole list.But what I wanted is to identify the common tag patterns which most sentences get matched.For example:

  • What is encapsulation

    tag pattern : {<WP><VBZ><NN>}
    
  • How was your wedding

    tag pattern : {<WRB><VBD><PRP$><NN>}
    
  • What is your plan today

    tag pattern : {<WP><VBZ><PRP$><NN><NN>}
    

So the common tag pattern(Combining regexp tagger) for above threes sentences is:

{<W.+><V.+><PRP.?>?<NN>+} - One "Wh" word,one verb,zero or one pronoun,one or many nouns

So I want to generalize the tag patterns of sentences to common ones.This is the thing what I wanted to do..

So can someone tell me how to do that?


回答1:


It sounds like you are after a regexp (with quantifiers) that will match all the different tag sequences in your data. While this is not an easy problem, I suspect that your goal is to find a pattern that captures the sequences that are legal sentences, is this right?

If so, regexps (and finite-state approaches in general) are inherently the wrong tool for the job. To even get a start on characterizing your sentence collection, you need to look at context-free grammars. Take a look at the NLTK's materials on the topic.



来源:https://stackoverflow.com/questions/33318975/how-to-get-common-tag-pattern-for-sentences-list-in-python-with-nltk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!