n-gram

How to get the probability of bigrams in a text of sentences?

老子叫甜甜 提交于 2019-12-12 12:25:41
问题 I have a text which has many sentences. How can I use nltk.ngrams to process it? This is my code: sequence = nltk.tokenize.word_tokenize(raw) bigram = ngrams(sequence,2) freq_dist = nltk.FreqDist(bigram) prob_dist = nltk.MLEProbDist(freq_dist) number_of_bigrams = freq_dist.N() However, the above code supposes that all sentences are one sequence. But, sentences are separated, and I guess the last word of one sentence is unrelated to the start word of another sentence. How can I create a bigram

Generate bigrams with NLTK

不问归期 提交于 2019-12-12 07:47:36
问题 I am trying to produce a bigram list of a given sentence for example, if I type, To be or not to be I want the program to generate to be, be or, or not, not to, to be I tried the following code but just gives me <generator object bigrams at 0x0000000009231360> This is my code: import nltk bigrm = nltk.bigrams(text) print(bigrm) So how do I get what I want? I want a list of combinations of the words like above (to be, be or, or not, not to, to be). 回答1: nltk.bigrams() returns an iterator (a

error TypeError: 'str' object is not callable python

瘦欲@ 提交于 2019-12-12 04:43:19
问题 I have this error in my code and I don't understand how to fixed import nltk from nltk.util import ngrams def word_grams(words, min=1, max=4): s = [] for n in range(min, max): for ngram in ngrams(words, n): s.append(' '.join(str(i) for i in ngram)) return s print word_grams('one two three four'.split(' ')) erorr in s.append(' '.join(str(i) for i in ngram)) TypeError: 'str' object is not callable 回答1: The code you posted is correct and work with both python 2.7 and 3.6 (for 3.6 you have to put

Creating n-grams word cloud using python

家住魔仙堡 提交于 2019-12-12 04:11:39
问题 I am trying to generate word cloud using bi-grams. I am able to generate the top 30 discriminative words but unable to display words together while plotting. My word cloud image still looks like a uni-gram cloud. I have used the following script and sci-kit learn packages. def create_wordcloud(pipeline): """ Create word cloud with top 30 discriminative words for each category """ class_labels = numpy.array(['Arts','Music','News','Politics','Science','Sports','Technology']) feature_names

How to search a corpus to find frequency of a string?

风格不统一 提交于 2019-12-12 01:45:21
问题 I'm working on an NLP project and I'd like to search through a corpus of text to try to find the frequency of a given verb-object pair. The aim would be to find which verb-object pair is most likely when given a few different possibilities. For example, if given the strings "Swing the stick" and "Eat the stick" I would hope that the corpus would show it's much more likely for someone to swing a stick than eat one. I've been reading about n-grams and corpus linguistics but I'm struggling to

Optimization of an R loop taking 18 hours to run

人走茶凉 提交于 2019-12-12 00:15:20
问题 I've got an R code that works and does what I want but It takes a huge time to run. Here is an explanation of what the code does and the code itself. I've got a vector of 200000 line containing street adresses (String) : data. Example : > data[150000,] address "15 rue andre lalande residence marguerite yourcenar 91000 evry france" And I have a matrix of 131x2 string elements which are 5grams (part of word) and the ids of the bags of NGrams (example of a 5Grams bag : ["stack", "tacko", "ackov"

How to interpret Python NLTK bigram likelihood ratios?

自古美人都是妖i 提交于 2019-12-11 17:32:30
问题 I'm trying to figure out how to properly interpret nltk 's "likelihood ratio" given the below code (taken from this question). import nltk.collocations import nltk.corpus import collections bgm = nltk.collocations.BigramAssocMeasures() finder = nltk.collocations.BigramCollocationFinder.from_words(nltk.corpus.brown.words()) scored = finder.score_ngrams(bgm.likelihood_ratio) # Group bigrams by first word in bigram. prefix_keys = collections.defaultdict(list) for key, scores in scored: prefix

How to extract the verbs and all corresponding adverbs from a text?

梦想的初衷 提交于 2019-12-11 16:53:30
问题 Using ngram in Python my aim is to find out verbs and their corresponding adverbs from an input text. What I have done: Input text:""He is talking weirdly. A horse can run fast. A big tree is there. The sun is beautiful. The place is well decorated.They are talking weirdly. She runs fast. She is talking greatly.Jack runs slow."" Code:- `finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',)) scored = finder2.score_ngrams(bigram_measures.raw

extracting n-grams from tweets in python

主宰稳场 提交于 2019-12-11 14:56:15
问题 Say that I have 100 tweets. In those tweets, I need to extract: 1) food names, and 2) beverage names. Example of tweet: "Yesterday I had a coca cola, and a hot dog for lunch, and some bana split for desert. I liked the coke, but the banana in the banana split dessert was ripe" I have to my disposal two lexicons. One with food names, and one with beverage names. Example in food names lexicon: "hot dog" "banana" "banana split" Example in beverage names lexicon: "coke" "cola" "coca cola" What I

n-grams with Naive Bayes classifier Error

扶醉桌前 提交于 2019-12-11 11:49:50
问题 I was experimenting with python NLTK text classification. Here is the code example i am practicing: http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/ Here is code: from nltk import bigrams from nltk.probability import ELEProbDist, FreqDist from nltk import NaiveBayesClassifier from collections import defaultdict train_samples = {} with file ('data/positive.txt', 'rt') as f: for line in f.readlines(): train_samples[line] = 'pos' with file ('data/negative.txt',