part-of-speech

Part of Speech (POS) tag Feature Selection for Text Classification

我怕爱的太早我们不能终老 提交于 2019-12-09 07:01:42
问题 I have the POS tag sentences obtain using Stanford POS tagger. Eg: The/DT island/NN was/VBD very/RB beautiful/JJ ./. I/PRP love/VBP it/PRP ./. (xml format also available) Can anyone explain how to perform feature selection from this POS tag sentences and convert them into feature vector for text classification using machine learning method. 回答1: A simple way to start out would be something like the following (assuming word order is not important for your classification algorithm). First you

Provoke the NLTK part-of-speech tagger to report a plural proper noun

喜夏-厌秋 提交于 2019-12-08 04:28:40
问题 Let's try out Python's renouned part-of-speech tagger in the nltk package. import nltk # You might also need to run nltk.download('maxent_treebank_pos_tagger') # even after installing nltk string = 'Buddy Billy went to the moon and came Back with several Vikings.' nltk.pos_tag(nltk.word_tokenize(string)) This gives me [('Buddy', 'NNP'), ('Billy', 'NNP'), ('went', 'VBD'), ('to', 'TO'), ('the', 'DT'), ('moon', 'NN'), ('and', 'CC'), ('came', 'VBD'), ('Back', 'NNP'), ('with', 'IN'), ('several',

Provoke the NLTK part-of-speech tagger to report a plural proper noun

好久不见. 提交于 2019-12-07 05:37:24
Let's try out Python's renouned part-of-speech tagger in the nltk package. import nltk # You might also need to run nltk.download('maxent_treebank_pos_tagger') # even after installing nltk string = 'Buddy Billy went to the moon and came Back with several Vikings.' nltk.pos_tag(nltk.word_tokenize(string)) This gives me [('Buddy', 'NNP'), ('Billy', 'NNP'), ('went', 'VBD'), ('to', 'TO'), ('the', 'DT'), ('moon', 'NN'), ('and', 'CC'), ('came', 'VBD'), ('Back', 'NNP'), ('with', 'IN'), ('several', 'JJ'), ('Vikings', 'NNS'), ('.', '.')] You can interpret the codes here . I'm slightly disappointed that

What created `maxent_treebank_pos_tagger/english.pickle`?

☆樱花仙子☆ 提交于 2019-12-06 20:19:08
问题 The nltk package's built-in part-of-speech tagger does not seem to be optimized for my use-case (here, for instance). The source code here shows that it's using a saved, pre-trained classifier called maxent_treebank_pos_tagger . What created maxent_treebank_pos_tagger/english.pickle ? I'm guessing that there is a tagged corpus out there somewhere that was used to train this tagger, so I think I'm looking for (a) that tagged corpus and (b) the exact code that trains the tagger based on the

What created `maxent_treebank_pos_tagger/english.pickle`?

大城市里の小女人 提交于 2019-12-05 01:24:49
The nltk package's built-in part-of-speech tagger does not seem to be optimized for my use-case ( here, for instance ). The source code here shows that it's using a saved, pre-trained classifier called maxent_treebank_pos_tagger . What created maxent_treebank_pos_tagger/english.pickle ? I'm guessing that there is a tagged corpus out there somewhere that was used to train this tagger, so I think I'm looking for (a) that tagged corpus and (b) the exact code that trains the tagger based on the tagged corpus. In addition to lots of googling, so far I tried to look at the .pickle object directly to

grouping all Named entities in a Document

主宰稳场 提交于 2019-12-04 06:16:51
问题 I would like to group all named entities in a given document. For Example, **Barack Hussein Obama** II is the 44th and current President of the United States, and the first African American to hold the office. I do not want to use OpenNLP APIs as it might not be able to recognize all named entities. Is there any way to generate such n-grams using other services or may be a way to group all noun terms together. 回答1: If you want to avoid using NER, you could use a sentence chunker or parser.

How to extract lines numbers that match a regular expression in a text file

只愿长相守 提交于 2019-11-30 16:48:41
I'm doing a project on statistical machine translation in which I need to extract line numbers from a POS-tagged text file that match a regular expression (any non-separated phrasal verb with the particle 'out'), and write the line numbers to a file (in python). I have this regular expression: '\w*_VB.?\sout_RP' and my POS-tagged text file: 'Corpus.txt'. I would like to get an output file with the line numbers that match the above-mentioned regular expression, and the output file should just have one line number per line (no empty lines), e.g.: 2 5 44 So far all I have in my script is the

How to extract lines numbers that match a regular expression in a text file

孤者浪人 提交于 2019-11-29 23:28:17
问题 I'm doing a project on statistical machine translation in which I need to extract line numbers from a POS-tagged text file that match a regular expression (any non-separated phrasal verb with the particle 'out'), and write the line numbers to a file (in python). I have this regular expression: '\w*_VB.?\sout_RP' and my POS-tagged text file: 'Corpus.txt'. I would like to get an output file with the line numbers that match the above-mentioned regular expression, and the output file should just

Strategies for recognizing proper nouns in NLP

元气小坏坏 提交于 2019-11-28 05:06:41
I'm interested in learning more about Natural Language Processing (NLP) and am curious if there are currently any strategies for recognizing proper nouns in a text that aren't based on dictionary recognition? Also, could anyone explain or link to resources that explain the current dictionary-based methods? Who are the authoritative experts on NLP or what are the definitive resources on the subject? The task of determining the proper part of speech for a word in a text is called Part of Speech Tagging . The Brill tagger , for example, uses a mixture of dictionary(vocabulary) words and

Strategies for recognizing proper nouns in NLP

坚强是说给别人听的谎言 提交于 2019-11-27 00:39:05
问题 I'm interested in learning more about Natural Language Processing (NLP) and am curious if there are currently any strategies for recognizing proper nouns in a text that aren't based on dictionary recognition? Also, could anyone explain or link to resources that explain the current dictionary-based methods? Who are the authoritative experts on NLP or what are the definitive resources on the subject? 回答1: The task of determining the proper part of speech for a word in a text is called Part of