tagged-corpus

How do I get a set of grammar rules from Penn Treebank using python & NLTK?

半腔热情 提交于 2019-12-20 08:49:55
问题 I'm fairly new to NLTK and Python. I've been creating sentence parses using the toy grammars given in the examples but I would like to know if it's possible to use a grammar learned from a portion of the Penn Treebank, say, as opposed to just writing my own or using the toy grammars? (I'm using Python 2.7 on Mac) Many thanks 回答1: If you want a grammar that precisely captures the Penn Treebank sample that comes with NLTK, you can do this, assuming you've downloaded the Treebank data for NLTK

How to build POS-tagged corpus with NLTK?

穿精又带淫゛_ 提交于 2019-12-12 14:23:52
问题 I try to build a POS-tagged corpus from external .txt files for chunking and entity and relation extraction. So far I have found a cumbersome multistep solution: Read files with into a plain text corpus: from nltk.corpus.reader import PlaintextCorpusReader my_corp = PlaintextCorpusReader(".", r".*\.txt") Tag corpus with built-in Penn POS-tagger: my_tagged_corp= nltk.batch_pos_tag(my_corp.sents()) (By the way, at this pont Python threw an error: NameError: name 'batch' is not defined ) Write

Wordnet (Word Sense Annotated) Corpus

爱⌒轻易说出口 提交于 2019-12-09 13:26:28
问题 I've been utilizing lots of different corpora for natural language processing, and I've been looking for a corpus that has been annotated with Wordnet Word Senses. I understand that there probably is not a big corpus with this information, since the corpus needs to be built up manually, but there has to be something to go off of. Also if there isn't a corpus in existence, is there at least a sense annotated ngram database (with what percentage of the time a word is each of its definitions, or

NLTK - Get and Simplify List of Tags

和自甴很熟 提交于 2019-12-06 04:52:57
问题 I'm using the Brown Corpus. I want some way to print out all the possible tags and their names (not just tag abbreviations). There are also quite a few tags, is there a way to 'simplify' the tags? By simplify I mean combine two extremely similar tags into one and re-tag the merged words with the other tag? 回答1: It's somehow discussed previously in: Java Stanford NLP: Part of Speech labels? Simplifying the French POS Tag Set with NLTK https://linguistics.stackexchange.com/questions/2249/turn

NLTK - Get and Simplify List of Tags

独自空忆成欢 提交于 2019-12-04 08:36:26
I'm using the Brown Corpus. I want some way to print out all the possible tags and their names (not just tag abbreviations). There are also quite a few tags, is there a way to 'simplify' the tags? By simplify I mean combine two extremely similar tags into one and re-tag the merged words with the other tag? alvas It's somehow discussed previously in: Java Stanford NLP: Part of Speech labels? Simplifying the French POS Tag Set with NLTK https://linguistics.stackexchange.com/questions/2249/turn-penn-treebank-into-simpler-pos-tags The POS tag output from nltk.pos_tag are PennTreeBank tagset, https

How do I get a set of grammar rules from Penn Treebank using python & NLTK?

不打扰是莪最后的温柔 提交于 2019-12-02 17:18:50
I'm fairly new to NLTK and Python. I've been creating sentence parses using the toy grammars given in the examples but I would like to know if it's possible to use a grammar learned from a portion of the Penn Treebank, say, as opposed to just writing my own or using the toy grammars? (I'm using Python 2.7 on Mac) Many thanks If you want a grammar that precisely captures the Penn Treebank sample that comes with NLTK, you can do this, assuming you've downloaded the Treebank data for NLTK (see comment below): import nltk from nltk.corpus import treebank from nltk.grammar import ContextFreeGrammar