How to get common tag pattern for sentences list in python with NLTK

问题

Here I have a list of sentences.With NLTK I can tag the sentence and get the tag pattern of that sentences. So as like this I can get the tag patterns for the whole list.But what I wanted is to identify the common tag patterns which most sentences get matched.For example:

What is encapsulation
```
tag pattern : {<WP><VBZ><NN>}
```
How was your wedding
```
tag pattern : {<WRB><VBD><PRP$><NN>}
```

What is your plan today

tag pattern : {<WP><VBZ><PRP$><NN><NN>}

So the common tag pattern(Combining regexp tagger) for above threes sentences is:

{<W.+><V.+><PRP.?>?<NN>+} - One "Wh" word,one verb,zero or one pronoun,one or many nouns

So I want to generalize the tag patterns of sentences to common ones.This is the thing what I wanted to do..

So can someone tell me how to do that?

回答1:

It sounds like you are after a regexp (with quantifiers) that will match all the different tag sequences in your data. While this is not an easy problem, I suspect that your goal is to find a pattern that captures the sequences that are legal sentences, is this right?

If so, regexps (and finite-state approaches in general) are inherently the wrong tool for the job. To even get a start on characterizing your sentence collection, you need to look at context-free grammars. Take a look at the NLTK's materials on the topic.

来源：https://stackoverflow.com/questions/33318975/how-to-get-common-tag-pattern-for-sentences-list-in-python-with-nltk

标签

python

nltk

tagging

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!