问题
Here I have a list of sentences.With NLTK I can tag the sentence and get the tag pattern of that sentences. So as like this I can get the tag patterns for the whole list.But what I wanted is to identify the common tag patterns which most sentences get matched.For example:
What is encapsulation
tag pattern : {<WP><VBZ><NN>}
How was your wedding
tag pattern : {<WRB><VBD><PRP$><NN>}
What is your plan today
tag pattern : {<WP><VBZ><PRP$><NN><NN>}
So the common tag pattern(Combining regexp tagger) for above threes sentences is:
{<W.+><V.+><PRP.?>?<NN>+} - One "Wh" word,one verb,zero or one pronoun,one or many nouns
So I want to generalize the tag patterns of sentences to common ones.This is the thing what I wanted to do..
So can someone tell me how to do that?
回答1:
It sounds like you are after a regexp (with quantifiers) that will match all the different tag sequences in your data. While this is not an easy problem, I suspect that your goal is to find a pattern that captures the sequences that are legal sentences, is this right?
If so, regexps (and finite-state approaches in general) are inherently the wrong tool for the job. To even get a start on characterizing your sentence collection, you need to look at context-free grammars. Take a look at the NLTK's materials on the topic.
来源:https://stackoverflow.com/questions/33318975/how-to-get-common-tag-pattern-for-sentences-list-in-python-with-nltk