I am making my way through the NLTK book and I can\'t seem to do something that would appear to be a natural first step for building a decent grammar.
My goal is to buil
You could run a POS tagger over your text and then adapt your grammar to work on POS tags instead of words.
> text = nltk.word_tokenize("A car has a door")
['A', 'car', 'has', 'a', 'door']
> tagged_text = nltk.pos_tag(text)
[('A', 'DT'), ('car', 'NN'), ('has', 'VBZ'), ('a', 'DT'), ('door', 'NN')]
> pos_tags = [pos for (token,pos) in nltk.pos_tag(text)]
['DT', 'NN', 'VBZ', 'DT', 'NN']
> simple_grammar = nltk.parse_cfg("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP
VP -> V NP | VP PP
Det -> 'DT'
N -> 'NN'
V -> 'VBZ'
P -> 'PP'
""")
> parser = nltk.ChartParser(simple_grammar)
> tree = parser.parse(pos_tags)